summaryrefslogtreecommitdiff
path: root/research/flossing/external/julia-1.6.7/share/julia/stdlib/v1.6/Tar
diff options
context:
space:
mode:
Diffstat (limited to 'research/flossing/external/julia-1.6.7/share/julia/stdlib/v1.6/Tar')
-rw-r--r--research/flossing/external/julia-1.6.7/share/julia/stdlib/v1.6/Tar/README.md467
1 files changed, 467 insertions, 0 deletions
diff --git a/research/flossing/external/julia-1.6.7/share/julia/stdlib/v1.6/Tar/README.md b/research/flossing/external/julia-1.6.7/share/julia/stdlib/v1.6/Tar/README.md
new file mode 100644
index 0000000..d9c2e08
--- /dev/null
+++ b/research/flossing/external/julia-1.6.7/share/julia/stdlib/v1.6/Tar/README.md
@@ -0,0 +1,467 @@
+# Tar.jl
+
+[![Build Status](https://travis-ci.org/JuliaIO/Tar.jl.svg?branch=master)](https://travis-ci.org/JuliaIO/Tar.jl)
+[![Codecov](https://codecov.io/gh/JuliaIO/Tar.jl/branch/master/graph/badge.svg)](https://codecov.io/gh/JuliaIO/Tar.jl)
+
+The `Tar` package can list, extract and create POSIX TAR archives ("tarballs")
+as specified in [POSIX
+1003.1-2001](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/pax.html).
+It is designed to support using the TAR format as a mechanism for sending trees
+of files from one system to another, rather than for the historical use case of
+backing up files for restoration to the same system. Because of this design
+goal, `Tar` intentionally ignores much of the metadata included in the TAR
+format, which does not make sense for the data transfer use case. The package
+also does not aim to read or create legacy non-POSIX variants of the TAR format,
+although it does support reading GNU long name and long link extensions.
+
+## API & Usage
+
+The public API of `Tar` includes five functions and one type:
+
+* `create` — creates a tarball from an on-disk file tree
+* `extract` — extracts a tarball to an on-disk file tree
+* `list` — lists the contents of a tarball as a vector of `Header` objects
+* `rewrite` — rewrite a tarball to the standard format `create` produces
+* `tree_hash` — compute a tree hash of the content of a tarball (default: git
+ SHA1)
+* `Header` — struct representing metadata that `Tar` considers important in a
+ TAR entry
+
+None of these are exported, however: the recommended usage is to do `import Tar`
+and then access all of these names fully qualified as `Tar.create`,
+`Tar.extract` and so on.
+
+<!-- BEGIN: copied from inline doc strings -->
+
+### Tar.create
+
+```jl
+create([ predicate, ] dir, [ tarball ]; [ skeleton ]) -> tarball
+```
+* `predicate :: String --> Bool`
+* `dir :: AbstractString`
+* `tarball :: Union{AbstractString, AbstractCmd, IO}`
+* `skeleton :: Union{AbstractString, AbstractCmd, IO}`
+
+Create a tar archive ("tarball") of the directory `dir`. The resulting archive
+is written to the path `tarball` or if no path is specified, a temporary path is
+created and returned by the function call. If `tarball` is an IO object then the
+tarball content is written to that handle instead (the handle is left open).
+
+If a `predicate` function is passed, it is called on each system path that is
+encountered while recursively searching `dir` and `path` is only included in the
+tarball if `predicate(path)` is true. If `predicate(path)` returns false for a
+directory, then the directory is excluded entirely: nothing under that directory
+will be included in the archive.
+
+If the `skeleton` keyword is passed then the file or IO handle given is used as
+a "skeleton" to generate the tarball. You create a skeleton file by passing the
+`skeleton` keyword to the `extract` command. If `create` is called with that
+skeleton file and the extracted files haven't changed, an identical tarball is
+recreated. The `skeleton` and `predicate` arguments cannot be used together.
+
+### Tar.extract
+
+```jl
+extract([ predicate, ] tarball, [ dir ];
+ [ skeleton, ] [ copy_symlinks ]) -> dir
+```
+* `predicate :: Header --> Bool`
+* `tarball :: Union{AbstractString, AbstractCmd, IO}`
+* `dir :: AbstractString`
+* `skeleton :: Union{AbstractString, AbstractCmd, IO}`
+* `copy_symlinks :: Bool`
+
+Extract a tar archive ("tarball") located at the path `tarball` into the
+directory `dir`. If `tarball` is an IO object instead of a path, then the
+archive contents will be read from that IO stream. The archive is extracted to
+`dir` which must either be an existing empty directory or a non-existent path
+which can be created as a new directory. If `dir` is not specified, the archive
+is extracted into a temporary directory which is returned by `extract`.
+
+If a `predicate` function is passed, it is called on each `Header` object that
+is encountered while extracting `tarball` and the entry is only extracted if the
+`predicate(hdr)` is true. This can be used to selectively extract only parts of
+an archive, to skip entries that cause `extract` to throw an error, or to record
+what is extracted during the extraction process.
+
+If the `skeleton` keyword is passed then a "skeleton" of the extracted tarball
+is written to the file or IO handle given. This skeleton file can be used to
+recreate an identical tarball by passing the `skeleton` keyword to the `create`
+function. The `skeleton` and `predicate` arguments cannot be used together.
+
+If `copy_symlinks` is `true` then instead of extracting symbolic links as such,
+they will be extracted as copies of what they link to if they are internal to
+the tarball and if it is possible to do so. Non-internal symlinks, such as a
+link to `/etc/passwd` will not be copied. Symlinks which are in any way cyclic
+will also not be copied and will instead be skipped. By default, `extract` will
+detect whether symlinks can be created in `dir` or not and will automatically
+copy symlinks if they cannot be created.
+
+### Tar.list
+
+```jl
+list(tarball; [ strict = true ]) -> Vector{Header}
+list(callback, tarball; [ strict = true ])
+```
+* `callback :: Header --> Bool`
+* `tarball :: Union{AbstractString, AbstractCmd, IO}`
+* `strict :: Bool`
+
+List the contents of a tar archive ("tarball") located at the path `tarball`. If
+`tarball` is an IO handle, read the tar contents from that stream. Returns a
+vector of `Header` structs. See [`Header`](@ref) for details. If a `callback` is
+provided then instead of returning a vector of headers, the callback is called
+on each `Header`. This can be useful if the number of items in the tarball is
+large or if you want examine items prior to an error in the tarball.
+
+By default `list` will error if it encounters any tarball contents which the
+`extract` function would refuse to extract. With `strict=false` it will skip
+these checks and list all the the contents of the tar file whether `extract`
+would extract them or not. Beware that malicious tarballs can do all sorts of
+crafty and unexpected things to try to trick you into doing something bad.
+
+If the `tarball` argument is a skeleton file (see `extract` and `create`) then
+`list` will detect that from the file header and appropriately list or iterate
+the headers of the skeleton file.
+
+### Tar.rewrite
+
+```jl
+rewrite([ predicate, ], old_tarball, [ new_tarball ]) -> new_tarball
+```
+* `predicate :: Header --> Bool`
+* `old_tarball :: Union{AbstractString, AbstractCmd, IO}`
+* `new_tarball :: Union{AbstractString, AbstractCmd, IO}`
+
+Rewrite `old_tarball` to the standard format that `create` generates, while also
+checking that it doesn't contain anything that would cause `extract` to raise an
+error. This is functionally equivalent to doing
+```jl
+Tar.create(Tar.extract(predicate, old_tarball), new_tarball)
+```
+However, it never extracts anything to disk and instead uses the `seek` function
+to navigate the old tarball's data. If no `new_tarball` argument is passed, the
+new tarball is written to a temporary file whose path is returned.
+
+If a `predicate` function is passed, it is called on each `Header` object that
+is encountered while extracting `old_tarball` and the entry is skipped unless
+`predicate(hdr)` is true. This can be used to selectively rewrite only parts of
+an archive, to skip entries that would cause `extract` to throw an error, or to
+record what content is encountered during the rewrite process.
+
+### Tar.tree_hash
+
+```jl
+tree_hash([ predicate, ] tarball;
+ [ algorithm = "git-sha1", ]
+ [ skip_empty = false ]) -> hash::String
+```
+* `predicate :: Header --> Bool`
+* `tarball :: Union{AbstractString, AbstractCmd, IO}`
+* `algorithm :: AbstractString`
+* `skip_empty :: Bool`
+
+Compute a tree hash value for the file tree that the tarball contains. By
+default, this uses git's tree hashing algorithm with the SHA1 secure hash
+function (like current versions of git). This means that for any tarball whose
+file tree git can represent—i.e. one with only files, symlinks and non-empty
+directories—the hash value computed by this function will be the same as the
+hash value git would compute for that file tree. Note that tarballs can
+represent file trees with empty directories, which git cannot store, and this
+function can generate hashes for those, which will, by default (see `skip_empty`
+below for how to change this behavior), differ from the hash of a tarball which
+omits those empty directories. In short, the hash function agrees with git on
+all trees which git can represent, but extends (in a consistent way) the domain
+of hashable trees to other trees which git cannot represent.
+
+If a `predicate` function is passed, it is called on each `Header` object that
+is encountered while processing `tarball` and an entry is only hashed if
+`predicate(hdr)` is true. This can be used to selectively hash only parts of an
+archive, to skip entries that cause `extract` to throw an error, or to record
+what is extracted during the hashing process.
+
+Currently supported values for `algorithm` are `git-sha1` (the default) and
+`git-sha256`, which uses the same basic algorithm as `git-sha1` but replaces the
+SHA1 hash function with SHA2-256, the hash function that git will transition to
+using in the future (due to known attacks on SHA1). Support for other file tree
+hashing algorithms may be added in the future.
+
+The `skip_empty` option controls whether directories in the tarball which
+recursively contain no files or symlinks are included in the hash or ignored.
+In general, if you are hashing the content of a tarball or a file tree, you care
+about all directories, not just non-empty ones, so including these in the
+computed hash is the default. So why does this function even provide the option
+to skip empty directories? Because git refuses to store empty directories and
+will ignore them if you try to add them to a repo. So if you compute a reference
+tree hash by by adding files to a git repo and then asking git for the tree
+hash, the hash value that you get will match the hash value computed by
+`tree_hash` with `skip_empty=true`. In other words, this option allows
+`tree_hash` to emulate how git would hash a tree with empty directories. If you
+are hashing trees that may contain empty directories (i.e. do not come from a
+git repo), however, it is recommended that you hash them using a tool (such as
+this one) that does not ignore empty directories.
+
+### Tar.Header
+
+The `Header` type is a struct representing the essential metadata for a single
+record in a tar file with this definition:
+```jl
+struct Header
+ path :: String # path relative to the root
+ type :: Symbol # type indicator (see below)
+ mode :: UInt16 # mode/permissions (best viewed in octal)
+ size :: Int64 # size of record data in bytes
+ link :: String # target path of a symlink
+end
+```
+Types are represented with the following symbols: `file`, `hardlink`, `symlink`,
+`chardev`, `blockdev`, `directory`, `fifo`, or for unknown types, the typeflag
+character as a symbol. Note that [`extract`](#Tarextract) refuses to extract
+records types other than `file`, `symlink` and `directory`; [`list`](#Tarlist)
+will only list other kinds of records if called with `strict=false`.
+
+<!-- END: copied from inline doc strings -->
+
+### Compression
+
+It is typical to compress tarballs when saving of transferring them. In the UNIX
+tradition of doing one thing and doing it well, the `Tar` package does not do
+any kind of compression and instead makes it easy to compose it's API with
+external compression tools. The simplest way to read a compressed archive is to
+use a command-line tool to decompress it. For example:
+```jl
+Tar.list(`gzcat $tarball`)
+Tar.extract(`gzcat $tarball`)
+```
+This will spawn the `gzcat $tarball` command, read the uncompressed tarball data
+from the output of that process, and then close the process. Creating a tarball
+with the `gzip` command is nearly as easy:
+```jl
+Tar.create(dir, pipeline(`gzip -9`, tarball))
+```
+This assumes that `dir` is the directory you want to archive and `tarball` is
+the path you want to create as a compressed archive.
+
+If you want to compress or decompress a tarball in the same process, you can
+using various
+[[TranscodingStreams](https://github.com/JuliaIO/TranscodingStreams.jl)
+packages:
+```jl
+using CodecZlib
+
+tar_gz = open(tarball, write=true)
+tar = GzipCompressorStream(tar_gz)
+Tar.create(dir, tar)
+close(tar)
+```
+This assumes that `dir` is the directory you want to archive and `tarball` is
+the path you want to create as a compressed archive. You can decompress
+in-process in a similar manner:
+```jl
+using CodecZlib
+
+tar_gz = open(tarball)
+tar = GzipDecompressorStream(tar_gz)
+dir = Tar.extract(tar)
+close(tar)
+```
+This assumes that `tarball` is the path of the compressed archive you want to
+extract.
+
+### API comparison with command-line tar
+
+It might be helpful to compare the `Tar` API with command-line `tar`. Unlike
+`tar -c` the `Tar.create` function does not include any of the path you tell it
+to bundle in the resulting TAR file: the location of the data is not part of the
+data. Doing `Tar.create(dir, tarball)` is roughly equivalent to running the
+following `tar` command:
+```sh
+tar -f $tarball -C $dir -c $(cd $dir; ls -A)
+```
+In other words, `tar` is told to change into the directory `dir` before
+constructing the tarball and then include all the top-level items in that
+directory without any path prefix. Note that the above command does not fully
+emulate the behavior of `Tar.create`: it does not sort entries in the same order
+and it still records user and group information, modification times and exact
+permissions. Coaxing command-line `tar` programs to omit this non-portable
+information and use a portable (and `git`-compatible sort order) is non-trivial.
+
+Another difference from command-line `tar`: non-empty directories are also
+omitted from the tarballs that `Tar` creates since no metadata is recorded about
+directories aside from the fact that they exist and the existence of non-empty
+directories is already implied by the fact that they contain something else. If,
+in the future, the ability to record metadata about directories is added,
+tarballs will record entries for non-empty directories with non-default
+metadata.
+
+On the extraction side of things, doing `Tar.extract(tarball, dir)` is roughly
+equivalent to the following commands:
+```sh
+test -d $dir || mkdir $dir
+tar -f $tarball -C $dir -mx
+```
+Again, `tar` is told to change into the directory `dir` before extracting the
+tarball and to extract each path relative to that directory. The `-m` option
+tells `tar` to ignore the modification times recorded in the tarball and just
+let files and directories have their natural modification times.
+
+If the current user has elevated privileges, the `tar` command will attempt to
+change the owner and group of files to what is recorded in the tarball, whereas
+`Tar.extract` will never do that. The `tar` command may also try to restore
+permissions without respecting the current `umask` if the user is an
+administrator. Again, `Tar.extract` will never do that—it behaves the same way
+for any users: by ignoring any user/group/permission information, aside from
+whether plain files are executable by their owner or not. To suppress these
+behaviors with GNU tar, you can use the `--no-same-owner` and
+`--no-same-permissions` options; these options are not broadly supported by
+other `tar` commands, which may not have options to support these behaviors.
+
+## Design & Features
+
+Unlike the `tar` command line tool, which was originally designed to archive
+data in order to restore it back to the same system or to a replica thereof, the
+`Tar` package is designed for using the TAR format to transfer trees of files
+and directories from one system to another. This design goal means that some
+metadata fields supported by the TAR format and used by default by historical
+`tar` tools are not used or supported by `Tar`. In short, the choice of features
+and defaults for `Tar` are designed to support transfer of data, rather than
+backup and restoration.
+
+The TAR format can, for example, record the name and ID of the user that owns
+each file. Recording this information makes perfect sense when using tarballs
+for backup: the `tar` program should run as root when restoring data, so it can
+restore the original owner of each file and directory. On the other hand, this
+ownership information is of no use when using the TAR format to transfer data
+from one system to another: the user names and IDs will not generally be the
+same on different systems, and the tool should _not_ be run as `root`, so it
+cannot change the owner of anything it extracts. For data transfer, ownership
+metadata should be disregarded and need not be recorded in the first place.
+
+Similarly, it makes little sense, when using tarballs for data transfer, to copy
+the modification time of each file from the source system. Those time stamps are
+unlikely to be relevant on the destination system, and in some cases, clock skew
+between the systems could mean that time stamps from the source appear to be in
+the future at the destination. This can confuse some programs and may even be
+perceived as an attempted security breach; most `tar` command line tools print
+warnings when extracting files with time stamps from the future. When using the
+TAR format for data transfer, it is better to ignore time stamps and just let
+the extracted contents have natural modification times.
+
+The features and defaults of the `Tar` package are guided by the principle that
+it uses the TAR format for transmitting data, not as a tool for backup and
+restoration. If you want to use the TAR format for archival purposes, you are
+likely better off using a traditional command line tool like [GNU
+tar](https://www.gnu.org/software/tar/). If, on the other hand, you want to use
+the TAR format to transmit data from one system to another, then you've come to
+the right place.
+
+### File Types
+
+Since `Tar` is designed for transmission of file and directory trees, it
+supports only the following file types:
+
+* plain files
+* directories
+* symlinks
+
+The `Tar` package does not support other file types that the TAR format can
+represent, including: hard links, character devices, block devices, and FIFOs.
+If you attempt to create or extract an archive that contains any of these kinds
+of entries, `Tar` will raise an error. You can, however, list the contents of a
+tarball containing other kinds of entries by passing the `strict=false` flag to
+the `list` function; without this option, `list` raises the same error as
+`extract` would.
+
+In the future, optional support may be added for using hard links within
+archives to avoid duplicating identical files.
+
+### Time Stamps
+
+Also in accordance with its design goal as a data transfer tool, the `Tar`
+package does not record or set modification times upon tarball creation and
+extraction. When creating a tarball, it sets the time stamp of each entry to
+`0`, representing the UNIX epoch (Jan 1st, 1970). When extracting a tarball, it
+ignores the time stamps of entries and lets all extracted content have "natural"
+modification times based on when each file or directory is extracted.
+
+In the future, optional support may be added for recording and restoring time
+stamps.
+
+### Users & Groups
+
+`Tar` ignores user and group names and IDs when creating and extracting
+tarballs. This is due to two facts:
+
+* names and IDs on source and destination systems will generally not match;
+* names and IDs can only be changed if `Tar` is run with elevated privileges.
+
+The first fact means that it probably doesn't make sense to try to restore
+ownership when transferring data, while the second fact means that it's probably
+not possible. Accordingly, `Tar` disregards user and group names and IDs when
+creating and extracting tarballs. During creation, the ID fields are recorded as
+`0` and names fields are recorded as the empty string. When extracting a
+tarball, the user and group fields are ignored entirely and all extracted
+content is owned by the current user.
+
+It is unlikely that support will be added for recording or restoring ownership
+of files or directories since that functionality only makes sense when using the
+TAR format for backup, a purpose better served by using a command line `tar`
+tool.
+
+### Permissions
+
+Upon tarball extraction, `Tar` respects the permissions recorded for each file.
+When creating tarball, however, it ignores most permission information and
+normalizes permissions as follows:
+
+* files that are not executable by the owner are archived with mode `0o644`;
+* files that are executable by the owner are archived with mode `0o755`;
+* directories and symlinks are always archived with mode `0o755`.
+
+In other words, `Tar` records only one significant bit of information: whether
+plain files are executable by their owner or not. No permission information for
+directories or symlinks is considered significant. This one bit of information
+is the only one which makes sense across all platforms, so this choice makes
+`Tar`'s behavior as portable as possible. On systems (like Windows) that do not
+use POSIX modes, whatever permission mechanism exists (_e.g._ ACLs) should be
+queried/modified to determine whether each file is executable by its owner or
+not. Unfortunately, this is currently broken on Windows since `libuv` does not
+correctly support querying or changing the user executable "bit"; this is
+actively being worked on, however, and should be fixed in future versions of
+Julia.
+
+In the future, optional support may be added for recording exact permission
+modes on POSIX systems, and possibly for normalizing permissions on extraction
+in the same way that they are normalized upon archive creation.
+
+### Reproducibility
+
+The information that `Tar` records about permissions is the same information
+that `git` considers to be significant when recording and hashing tree contents
+(admittedly not by coincidence). As a result, an important and useful
+consequence of `Tar`'s design is that it has the following properties:
+
+* if you create a tarball from a file tree and extract it, the new tree will
+ have the same `git` tree hash as the original;
+* if you `git checkout` a file tree and archive it using `Tar`, the resulting
+ TAR archive file is always the same.
+
+One important caveat to keep in mind is that `git` ignores directories that
+recursively contain only directories—_i.e._ unless there's a file or a symlink
+somewhere, `git` will not acknowledge the existence of a subdirectory. This
+means that two trees with the same `git` tree hash can produce different
+tarballs if they differ by subdirectories containing no files or symlinks: `git`
+will ignore those subdirectories, while `Tar` will not. Therefore, they will
+have the same `git` tree hash, but produce different tarballs. Two _identical_
+file trees will always produce identical tarballs, however, and that tarball
+should remain stable in future versions of the `Tar` package.
+
+The `tree_hash` function can be used to compute a git-style tree hash of the
+contents of a tarball (without needing to extract it). Moreover, two tarballs
+created by the `Tar` package will have the same hash if and only if they contain
+the same file tree, which is true if and only if they are identical tarballs.
+You can, however, hash tarballs not created by `Tar` this way to see if they
+represent the same file tree, and you can use the `skip_empty=true` option to
+`tree_hash` to compute the hash that `git` would assign the tree, ignoring empty
+directories.