Using the xmllib
Overview
The following code shows a minimal overview how you can use the xmllib
to create an XML file for a mass import to DSP.
It does not show, nor detail all the available functions and features, but will give you links to the relevant and comprehensive documentation.
The core functionality that the xmllib
provides you is as follows:
- Classes that will help you construct the resources and add values to it.
- Functions that help you clean and transform your input.
- Functions that help you validate your input.
- Configuration options such as Permissions.
We recommend you to create the import script for your data once your data model is relatively stable, as the construction of the code is highly dependent on the data model.
Code Example
Import the xmllib
from dsp_tools import xmllib
Creating the Root
The XMLRoot
is the central element where you add all your resources.
See the documentation for details.
root = xmllib.XMLRoot.create_new(shortcode="0000", default_ontology="onto")
Creating a Resource
Resources which were defined in the ontology JSON are created as follows.
See the documentation for details.
resource = xmllib.Resource.create_new(
res_id="general_resource",
restype=":ResourceType",
label="Title in the DSP-APP",
)
Adding Values to a Resource
Once you created a Resource
you can add values to it.
For each value type we provide three functionalities:
Adding one value
resource = resource.add_integer(prop_name=":hasInt", value=1)
Adding multiple values
resource = resource.add_integer_multiple(prop_name=":hasInt", values=[1, 2, 3])
Adding optional values
With this function we check if the input is empty, if it is the resource is returned as is, otherwise the value is added.
resource = resource.add_integer_optional(prop_name=":hasInt", value=None)
See the documentation for all options.
Reformatting the Input With Conversion Functions
We provide a number of functions to help convert your input. For example DSP requires a special format for dates. Reformat your input into the correct format.
formatted_date = xmllib.value_converters.reformat_date(
date="10.1990-11.1990",
date_precision_separator=".",
date_range_separator="-",
date_format=xmllib.DateFormat.DD_MM_YYYY,
)
resource = resource.add_date(prop_name=":hasDate", value=formatted_date)
Find further functions to convert values here
Data Cleaning Functions
Oftentimes, similar data cleaning steps must be carried out again and again, before values can be added to the resource. For example, many strings extracted from data sources represent not one value, but a list of values. We provide the following function to clean your input:
input_string = "This should be\na list\nof strings."
cleaned_list = xmllib.create_list_from_input(
input_value=input_string,
separator="\n"
)
You can find this and other functions to help you process your data here.
Please contact us with a feature request if you need a function that doesn't exist yet in the xmllib
,
especially if you feel that it may be generic enough to be helpful for other users.
Adding A File To a Resource
In the ontology you can specify that a resource should have a file attached.
You can add a file to the resource with the following function. Please note that only one file (or IIIF-URI, see below) is allowed per resource.
image_resource = image_resource.add_file(
filename="cat.tiff",
license=xmllib.LicenseRecommended.CC.BY,
copyright_holder="Mouse University",
authorship=["Minnie Mouse"],
)
We require that legal information be provided for resources with files. While the copyright holder and authorships are free text fields, the license must be chosen from a predefined set. The recommended licenses are listed here, and some others are listed here.
Please contact us if you required license is not available.
Adding IIIF-URIs To a Resource
We provide the option to add a IIIF-URI to a Resource with the super-class StillImageRepresentation
.
This way you can reference images that are hosted on other servers.
resource = resource.add_iiif_uri(
iiif_uri="https://iiif.wellcomecollection.org/image/b20432033_B0008608.JP2/full/1338%2C/0/default.jpg",
license=xmllib.LicenseRecommended.CC.BY_NC,
copyright_holder="Wellcome Collection",
authorship=["Cavanagh, Annie"]
)
Please note that the IIIF-URI must follow the official syntax specified here.
Adding a Comment To a Value
It is possible to add a comment to individual values or files with the following parameter.
resource = resource.add_integer(
prop_name=":hasInt",
value=1,
comment="This text is a comment on this integer.",
)
Specifying Permissions
We recommend to specify permissions on the project level or on individual classes and properties.
However, you can overrule the default permissions with the following designated parameter.
image_resource = image_resource.add_simpletext(
prop_name=":hasRemark",
value="Text only visible for the project.",
permissions=xmllib.Permissions.PRIVATE,
)
See the documentation for details.
Adding a Resource To the Root
Once you have added all information to the resource, you can add the resource to the root.
root = root.add_resource(resource)
Similarly to the values, we also provide the possibility to add multiple or optional resources to the root.
Creating DSP-Base Resources
DSP has four additional resource types that can be used out of the box without defining them in the data model.
AudioSegmentResource
andVideoSegmentResource
are resources that can be used to annotate a segment of an audio or video file of another resource.RegionResource
describes a region of a resource with an image file.LinkResource
is a collection of links to other resources.
These four types have pre-defined properties, which either are added directly when the resource is created or can be added afterwards with a dedicated function.
region = xmllib.RegionResource.create_new(
res_id="region_1",
label="Region label",
region_of="image_resource"
)
region = region.add_rectangle(
point1=(0.1, 0.1),
point2=(0.2, 0.2),
)
Please consult the individual documentations (linked above) for details.
Writing the File
Once you have added all data, you can write the XML file with the following function.
root.write_file("data.xml")
Validating Your Input
When your resources are created and data is added, the xmllib
validates your input.
In the background we use the validation functions specified here,
so there is no need to check your input manually.
Due to this, the xmllib
may print a large amount of information on your terminal.
You can configure the warning level as described here.
It is also possible to save the warnings into a csv file instead of the print message, see here for details.