Incremental Xmlupload
When uploading data with the xmlupload
command,
resources can reference each other with an internal ID,
e.g. in the <resptr>
tag.
Once the data is in DSP,
the resources cannot be referenced by their internal ID anymore.
Instead, the resource's IRI which was generated by the DSP has to be used.
After a successful xmlupload
,
the mapping of internal IDs to their respective IRIs
is written to a file called id2iri_mapping_[timestamp].json
.
The mapping is necessary if at a later point in time additional data should be uploaded. Depending on the type of references the additional data contains, there are 4 different ways how new data has to be uploaded:
- no references to existing resources: normal xmlupload
- references to existing resources via IRIs: incremental xmlupload
- references to existing resources via internal IDs: first id2iri, then incremental xmlupload
- continue an interrupted xmlupload: first id2iri, then incremental xmlupload
1. No References to Existing Resources
The first case is the simplest one: No mapping is required, and the additional data can be uploaded with:
dsp-tools xmlupload additional_data.xml
2. References to Existing Resources Via IRIs
The second case is relatively easy, too:
The file additional_data.xml
contains references like <resptr>http://rdfh.ch/4123/nyOODvYySV2nJ5RWRdmOdQ</resptr>
.
Such a file can be uploaded with:
dsp-tools xmlupload additional_data.xml
3. References to Existing Resources Via Internal IDs
The third case, however, is a bit more complicated:
The file additional_data.xml
contains references like <resptr>book_1</resptr>
,
or <text><a class="salsah-link" href="IRI:book_1:IRI">link to book_1</a></text>
,
where book_1
was the internal ID of a resource that had previously been uploaded to DSP.
Before such an XML file can be uploaded,
the internal IDs must be replaced with their respective IRIs.
That's where the JSON mapping file comes in:
It contains a mapping from book_1
to http://rdfh.ch/4123/nyOODvYySV2nJ5RWRdmOdQ
.
As a first step,
a new file must be generated
with the id2iri
command:
dsp-tools id2iri additional_data.xml id2iri_mapping_[timestamp].json
In a second step, the newly generated XML file can be uploaded to DSP:
dsp-tools xmlupload additional_data_replaced_[timestamp].xml
4. Continue an Interrupted Xmlupload
If a xmlupload didn't finish successfully, some resources have already been created, while others have not. If one of the remaining resources references an already created resource by its internal ID, this internal ID must be replaced by the IRI of the already created resource.
Additionally, the already created resources must be removed from the XML file. Otherwise, they would be created a second time.
In such a case, proceed as follows:
- Initial xmlupload:
dsp-tools xmlupload data.xml
- A crash happens. Some resources have been uploaded, and a
id2iri_mapping_[timestamp].json
file has been written - Fix the reason for the crash
- Replace the internal IDs and remove the created resources with:
dsp-tools id2iri data.xml --remove-resources id2iri_mapping_[timestamp].json
- Upload the outputted XML file with
dsp-tools xmlupload data_replaced_[timestamp].xml