Commit 59eb1d33 authored by Brad King's avatar Brad King Committed by Kitware Robot

Merge topic 'ExternalData-sha512'

41bbe5e6 ExternalData: Switch from MD5 to SHA512 for new content links
ec4c43c0 git-gitlab-push: Add support for ExternalData SHA512 objects
97639acd pre-commit: Add support for ExternalData SHA512 objects
Acked-by: Kitware Robot's avatarKitware Robot <kwrobot@kitware.com>
Merge-request: !4288
parents 4b51695d 41bbe5e6
...@@ -66,7 +66,7 @@ endif() ...@@ -66,7 +66,7 @@ endif()
# Tell ExternalData commands to transform raw files to content links. # Tell ExternalData commands to transform raw files to content links.
# TODO: Condition this feature on presence of our pre-commit hook. # TODO: Condition this feature on presence of our pre-commit hook.
set(ExternalData_LINK_CONTENT MD5) set(ExternalData_LINK_CONTENT SHA512)
# Match series of the form <base>.<ext>, <base>_<n>.<ext> such that <base> may # Match series of the form <base>.<ext>, <base>_<n>.<ext> such that <base> may
# end in a (test) number that is not part of any series numbering. # end in a (test) number that is not part of any series numbering.
......
...@@ -117,10 +117,10 @@ Copy the data file into your local source tree. ...@@ -117,10 +117,10 @@ Copy the data file into your local source tree.
During configuration CMake will display a message such as: During configuration CMake will display a message such as:
Linked Some/Module/Testing/Data/Baseline/MyTest.png.md5 to ExternalData MD5/... Linked Some/Module/Testing/Data/Baseline/MyTest.png.sha512 to ExternalData SHA512/...
This means that CMake converted the file into a data object referenced This means that CMake converted the file into a data object referenced
by a "content link" named like the original file but with a `.md5` by a "content link" named like the original file but with a `.sha512`
extension. CMake also [renamed the original file](#externaldata). extension. CMake also [renamed the original file](#externaldata).
3. Build 3. Build
...@@ -146,15 +146,15 @@ Continue to [create the topic](develop.md#create-a-topic) and edit other ...@@ -146,15 +146,15 @@ Continue to [create the topic](develop.md#create-a-topic) and edit other
files as necessary. Add the content link and commit it along with the files as necessary. Add the content link and commit it along with the
other changes: other changes:
$ git add Some/Module/Testing/Data/Baseline/MyTest.png.md5 $ git add Some/Module/Testing/Data/Baseline/MyTest.png.sha512
$ git add Some/Module/Testing/Data/CMakeLists.txt $ git add Some/Module/Testing/Data/CMakeLists.txt
$ git commit $ git commit
The local `pre-commit` hook will display a message such as: The local `pre-commit` hook will display a message such as:
Some/Module/Testing/Data/Baseline/MyTest.png.md5: Added content to Git at refs/data/MD5/... Some/Module/Testing/Data/Baseline/MyTest.png.sha512: Added content to Git at refs/data/SHA512/...
Some/Module/Testing/Data/Baseline/MyTest.png.md5: Added content to local store at .ExternalData/MD5/... Some/Module/Testing/Data/Baseline/MyTest.png.sha512: Added content to local store at .ExternalData/SHA512/...
Content link Some/Module/Testing/Data/Baseline/MyTest.png.md5 -> .ExternalData/MD5/... Content link Some/Module/Testing/Data/Baseline/MyTest.png.sha512 -> .ExternalData/SHA512/...
This means that the pre-commit hook recognized that the content link This means that the pre-commit hook recognized that the content link
references a new data object and [prepared it for upload](#pre-commit). references a new data object and [prepared it for upload](#pre-commit).
...@@ -193,9 +193,9 @@ your build tree the `VTKData` target must be built. One may build the ...@@ -193,9 +193,9 @@ your build tree the `VTKData` target must be built. One may build the
target directly, e.g. `make VTKData`, to obtain the data without a target directly, e.g. `make VTKData`, to obtain the data without a
complete build. The output will be something like complete build. The output will be something like
-- Fetching ".../ExternalData/MD5/..." -- Fetching ".../ExternalData/SHA512/..."
-- [download 100% complete] -- [download 100% complete]
-- Downloaded object: "VTK-build/ExternalData/Objects/MD5/..." -- Downloaded object: "VTK-build/ExternalData/Objects/SHA512/..."
The downloaded files appear in `VTK-build/ExternalData` by default. The downloaded files appear in `VTK-build/ExternalData` by default.
...@@ -231,22 +231,22 @@ discuss details of the workflow implementation. ...@@ -231,22 +231,22 @@ discuss details of the workflow implementation.
While [CMake runs](#run-cmake) the [ExternalData][] module evaluates While [CMake runs](#run-cmake) the [ExternalData][] module evaluates
[DATA{} references](#add-test). VTK [sets](/CMake/vtkExternalData.cmake) [DATA{} references](#add-test). VTK [sets](/CMake/vtkExternalData.cmake)
the `ExternalData_LINK_CONTENT` option to `MD5` to enable automatic the `ExternalData_LINK_CONTENT` option to `SHA512` to enable automatic
conversion of raw data files into content links. When the module detects conversion of raw data files into content links. When the module detects
a real data file in the source tree it performs the following a real data file in the source tree it performs the following
transformation as specified in the module documentation: transformation as specified in the module documentation:
* Compute the MD5 hash of the file * Compute the SHA512 hash of the file
* Store the `${hash}` in a file with the original name plus `.md5` * Store the `${hash}` in a file with the original name plus `.sha512`
* Rename the original file to `.ExternalData_MD5_${hash}` * Rename the original file to `.ExternalData_SHA512_${hash}`
The real data now sit in a file that we [tell Git to ignore](/.gitignore). The real data now sit in a file that we [tell Git to ignore](/.gitignore).
For example: For example:
$ cat Some/Module/Testing/Data/Baseline/.ExternalData_MD5_477e602800c18624d9bc7a32fa706b97 |md5sum $ cat Some/Module/Testing/Data/Baseline/.ExternalData_SHA512_477e6028* |sha512sum
477e602800c18624d9bc7a32fa706b97 - 477e6028... -
$ cat Some/Module/Testing/Data/Baseline/MyTest.png.md5 $ cat Some/Module/Testing/Data/Baseline/MyTest.png.sha512
477e602800c18624d9bc7a32fa706b97 477e6028...
#### Recover Data File #### #### Recover Data File ####
...@@ -254,26 +254,26 @@ To recover the original file after running CMake but before committing, ...@@ -254,26 +254,26 @@ To recover the original file after running CMake but before committing,
undo the operation: undo the operation:
$ cd Some/Module/Testing/Data/Baseline $ cd Some/Module/Testing/Data/Baseline
$ mv .ExternalData_MD5_$(cat MyTest.png.md5) MyTest.png $ mv .ExternalData_SHA512_$(cat MyTest.png.sha512) MyTest.png
### pre-commit ### ### pre-commit ###
While [committing](#commit) a new or modified content link the While [committing](#commit) a new or modified content link the
[pre-commit](/Utilities/Scripts/pre-commit) hook moves the real data [pre-commit](/Utilities/Scripts/pre-commit) hook moves the real data
object from the `.ExternalData_MD5_${hash}` file left by the object from the `.ExternalData_SHA512_${hash}` file left by the
[ExternalData][] module to a local object repository stored in a [ExternalData][] module to a local object repository stored in a
`.ExternalData` directory at the top of the source tree. `.ExternalData` directory at the top of the source tree.
The hook also uses Git plumbing commands to store the data object The hook also uses Git plumbing commands to store the data object
as a blob in the local Git repository. The blob is not referenced as a blob in the local Git repository. The blob is not referenced
by the new commit but instead by `refs/data/MD5/${hash}`. by the new commit but instead by `refs/data/SHA512/${hash}`.
This keeps the blob alive in the local repository but does not add This keeps the blob alive in the local repository but does not add
it to the project history. For example: it to the project history. For example:
$ git for-each-ref --format="%(refname)" refs/data $ git for-each-ref --format="%(refname)" refs/data
refs/data/MD5/477e602800c18624d9bc7a32fa706b97 refs/data/SHA512/477e6028...
$ git cat-file blob refs/data/MD5/477e602800c18624d9bc7a32fa706b97 | md5sum $ git cat-file blob refs/data/SHA512/477e6028... | sha512sum
477e602800c18624d9bc7a32fa706b97 - 477e6028... -
### git gitlab-push ### ### git gitlab-push ###
...@@ -288,8 +288,8 @@ The script pushes the matching data objects to your VTK GitLab fork. ...@@ -288,8 +288,8 @@ The script pushes the matching data objects to your VTK GitLab fork.
For example: For example:
$ git gitlab-push --dry-run --no-topic $ git gitlab-push --dry-run --no-topic
* refs/data/MD5/477e602800c18624d9bc7a32fa706b97:refs/data/MD5/477e602800c18624d9bc7a32fa706b97 [new branch] * refs/data/SHA512/477e6028...:refs/data/SHA512/477e6028... [new branch]
Pushed refs/data/MD5/477e602800c18624d9bc7a32fa706b97 and removed local ref. Pushed refs/data/SHA512/477e6028... and removed local ref.
A GitLab webhook that triggers whenever a topic branch is pushed checks A GitLab webhook that triggers whenever a topic branch is pushed checks
for `refs/data/` in your VTK GitLab fork, fetches them, erases the refs for `refs/data/` in your VTK GitLab fork, fetches them, erases the refs
...@@ -299,9 +299,9 @@ from your fork, and uploads them to a location that we ...@@ -299,9 +299,9 @@ from your fork, and uploads them to a location that we
To verify that the data has been uploaded as expected, you may direct To verify that the data has been uploaded as expected, you may direct
a web browser to the location where ExternalData has uploaded the files. a web browser to the location where ExternalData has uploaded the files.
For VTK, that location is currently For VTK, that location is currently
`http://www.vtk.org/files/ExternalData/MD5/XXXX` where `XXXX` is the `http://www.vtk.org/files/ExternalData/SHA512/XXXX` where `XXXX` is the
complete MD5 hash stored in the content link file (e.g., the text in complete SHA512 hash stored in the content link file (e.g., the text in
`MyTest.png.md5`). `MyTest.png.sha512`).
### Publishing Data for an External Branch ### ### Publishing Data for an External Branch ###
...@@ -314,8 +314,8 @@ The workflow for adding data to an external branch of VTK is the same ...@@ -314,8 +314,8 @@ The workflow for adding data to an external branch of VTK is the same
as the above through the [commit](#commit) step, but diverges at the as the above through the [commit](#commit) step, but diverges at the
[push](#push) step because one will push to a separate repository. [push](#push) step because one will push to a separate repository.
Our ExternalData infrastructure intentionally hides the real data files Our ExternalData infrastructure intentionally hides the real data files
from Git so only the content links (`.md5` files) will be pushed. from Git so only the content links (`.sha512` files) will be pushed.
The real data objects will still be left in the `.ExternalData/MD5` The real data objects will still be left in the `.ExternalData/SHA512`
directory at the top of the VTK source tree by the directory at the top of the VTK source tree by the
[pre-commit](#pre-commit) hook. [pre-commit](#pre-commit) hook.
...@@ -330,7 +330,7 @@ In this example we assume the files are published on a [Github Pages][] ...@@ -330,7 +330,7 @@ In this example we assume the files are published on a [Github Pages][]
`gh-pages` branch in `username`'s fork of VTK. `gh-pages` branch in `username`'s fork of VTK.
Within the `gh-pages` branch the files are placed at Within the `gh-pages` branch the files are placed at
`ExternalData/MD5/$md5sum` where `$md5sum` is the MD5 hash of the content `ExternalData/SHA512/$sha512sum` where `$sha512sum` is the SHA512 hash of the content
(these are the same names they have in the `.ExternalData` directory in (these are the same names they have in the `.ExternalData` directory in
the original source tree). the original source tree).
......
...@@ -81,12 +81,13 @@ data_report_and_remove() { ...@@ -81,12 +81,13 @@ data_report_and_remove() {
data_refs() { data_refs() {
git rev-list "$@" | git rev-list "$@" |
git diff-tree --no-commit-id --root -c -r --diff-filter=AM --stdin | git diff-tree --no-commit-id --root -c -r --diff-filter=AM --stdin |
egrep '\.(md5)$' | egrep '\.(md5|sha512)$' |
# read :srcmode dstmode srcobj dstobj status file # read :srcmode dstmode srcobj dstobj status file
while read _ _ _ obj _ file; do while read _ _ _ obj _ file; do
# Identify the hash algorithm used. # Identify the hash algorithm used.
case "$file" in case "$file" in
*.md5) algo=MD5 ; validate="^[0-9a-fA-F]{32}$" ;; *.md5) algo=MD5 ; validate="^[0-9a-fA-F]{32}$" ;;
*.sha512) algo=SHA512 ; validate="^[0-9a-fA-F]{128}$" ;;
*) continue ;; *) continue ;;
esac esac
......
...@@ -16,6 +16,7 @@ ExternalData_stage_linked_content() { ...@@ -16,6 +16,7 @@ ExternalData_stage_linked_content() {
# Identify the hash algorithm used. # Identify the hash algorithm used.
case "$file" in case "$file" in
*.md5) algo=MD5 ; base="${file/.md5}" ; validate="^[0-9a-fA-F]{32}$" ;; *.md5) algo=MD5 ; base="${file/.md5}" ; validate="^[0-9a-fA-F]{32}$" ;;
*.sha512) algo=SHA512 ; base="${file/.sha512}" ; validate="^[0-9a-fA-F]{128}$" ;;
*) die "$file: invalid content link (unrecognized extension)" ;; *) die "$file: invalid content link (unrecognized extension)" ;;
esac esac
...@@ -62,7 +63,7 @@ ExternalData_stage_linked_content() { ...@@ -62,7 +63,7 @@ ExternalData_stage_linked_content() {
ExternalData_non_content_link() { ExternalData_non_content_link() {
# Reject simultaneous raw file and content link. # Reject simultaneous raw file and content link.
files=$(git ls-files -- "$file.md5") files=$(git ls-files -- "$file.md5" "$file.sha512")
if test -n "$files"; then if test -n "$files"; then
die "$file: file may not coexist with $files" die "$file: file may not coexist with $files"
fi fi
...@@ -89,7 +90,7 @@ ExternalData_STORE=".ExternalData" ...@@ -89,7 +90,7 @@ ExternalData_STORE=".ExternalData"
# Process content links created by/for the CMake ExternalData module. # Process content links created by/for the CMake ExternalData module.
git diff-index --cached HEAD --diff-filter=AM | git diff-index --cached HEAD --diff-filter=AM |
while read src_mode dst_mode src_obj dst_obj status file; do while read src_mode dst_mode src_obj dst_obj status file; do
if echo "$dst_mode $file" | egrep_q '^100644 .*\.(md5)$'; then if echo "$dst_mode $file" | egrep_q '^100644 .*\.(md5|sha512)$'; then
ExternalData_stage_linked_content ExternalData_stage_linked_content
else else
ExternalData_non_content_link ExternalData_non_content_link
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment