Excel files for data modelling and data import
dsp-tools is able to process Excel files and output the appropriate JSON or XML file. The JSON/XML file can then be used to create the ontology on the DSP server or import data to the DSP repository. dsp-tools can also be used to create a list from an Excel file.
JSON project file: "resources" section from Excel file
With dsp-tools, the resources
section used in a data model (JSON) can be created from an Excel file. The command for
this is documented here.
Only XLSX
files are allowed. The resources
section can be inserted into the ontology file and then be uploaded onto
a DSP server.
An Excel file template can be found here. It is recommended to work from the template.
The expected worksheets of the Excel file are:
classes
: a table with all resource classes intended to be used in the resulting JSONclass1
,class2
,...: a table for each resource class named after its name
The worksheet called classes
must have the following structure:
The expected columns are:
name
(mandatory): The name of the resourceen
,de
,fr
,it
,rm
: The labels of the resource in different languages, at least one language has to be providedcomment_en
,comment_de
,comment_fr
,comment_it
,comment_rm
(optional): comments in the respective languagesuper
(mandatory): The base class(es) of the resource, separated by commas
The optional columns may be omitted in the Excel.
All other worksheets, one for each resource class, have the following structure: { width=50% }
The expected columns are:
Property
(mandatory): The name of the propertyCardinality
(mandatory): The cardinality, one of:1
,0-1
,1-n
,0-n
The GUI order is given by the order in which the properties are listed in the Excel sheet.
For further information about resources, see here.
JSON project file: "properties" section from Excel file
With dsp-tools, the properties
section used in a data model (JSON) can be created from an Excel file. The command for
this is documented here.
Only the first worksheet of the Excel file is considered and only XLSX files are allowed. The properties
section can
be inserted into the ontology file and then be uploaded onto a DSP server.
An Excel file template can be found here. It is recommended to work from the template.
The Excel sheet must have the following structure:
The expected columns are:
name
(mandatory): The name of the propertysuper
(mandatory): The base property/ies of the property, separated by commasobject
(mandatory): If the property is derived fromhasValue
, the type of the property must be further specified by the object it takes, e.g.TextValue
,ListValue
, orIntValue
. If the property is derived fromhasLinkTo
, theobject
specifies the resource class that this property refers to.en
,de
,fr
,it
,rm
: The labels of the property in different languages, at least one language has to be providedcomment_en
,comment_de
,comment_fr
,comment_it
,comment_rm
(optional): comments in the respective languagegui_element
(mandatory): The GUI element for the propertygui_attributes
(optional): The gui_attributes in the form "attr: value, attr: value".
The optional columns may be omitted in the Excel.
For backwards compatibility, files containing a column hlist
are valid, but deprecated.
For further information about properties, see here.
JSON project file: "lists" section from Excel file(s)
With dsp-tools, the "lists" section of a JSON project file can be created from one or several Excel files. The lists can then be inserted into a JSON project file and uploaded to a DSP server. The command for this is documented here.
The following example shows how to create the "lists" section from the two Excel files de.xlsx
and en.xlsx
which are located
in a directory called listfolder
:
dsp-tools excel2lists listfolder lists.json
The Excel sheets must have the following structure:
Some notes:
- The data must be in the first worksheet of each Excel file.
- It is important that all Excel files have the same structure. So, the translation of a label in the second Excel file has to be in the exact same cell as the one in the first Excel file.
- Only Excel files with file extension
.xlsx
are considered. - The file name must consist of the language label, e.g.
de.xlsx
/en.xlsx
. - The language has to be one of {de, en, fr, it, rm}.
- As node name, a simplified version of the English label is taken. If English is not available, one of the other languages is taken.
- If there are two nodes with the same name, an incrementing number is appended to the name.
- After the creation of the list, a validation against the JSON schema for lists is performed. An error message is printed out if the list is not valid.
It is recommended to work from the following templates:
en.xlsx: File with the English labels
de.xlsx: File with the German labels
The output of the above command, with the template files, is:
{
"lists": [
{
"name": "colors",
"labels": {
"de": "Farben",
"en": "colors"
},
"comments": {
"de": "Farben",
"en": "colors"
},
"nodes": [
{
"name": "red",
"labels": {
"de": "rot",
"en": "red"
}
},
...
]
},
{
"name": "category",
"labels": {
"de": "Kategorie",
"en": "category"
},
"comments": {
"de": "Kategorie",
"en": "category"
},
"nodes": [
{
"name": "artwork",
"labels": {
"de": "Kunstwerk",
"en": "artwork"
}
},
...
]
},
{
"name": "faculties-of-the-university-of-basel",
"labels": {
"de": "Fakultäten der Universität Basel",
"en": "Faculties of the University of Basel"
},
"comments": {
"de": "Fakultäten der Universität Basel",
"en": "Faculties of the University of Basel"
},
"nodes": [
{
"name": "faculty-of-science",
"labels": {
"de": "Philosophisch-Naturwissenschaftliche Fakultät",
"en": "Faculty of Science"
}
},
...
]
}
]
}
XML data file from Excel/CSV file
There are two use cases for a transformation from Excel/CSV to XML:
- The CLI command
dsp-tools excel2xml
creates an XML file from an Excel/CSV file which is already structured according to the DSP specifications. This is mostly used for DaSCH-interal data migration. - The module
excel2xml
can be imported into a custom Python script that transforms any tabular data into an XML. This use case is more frequent, because data from research projects have a variety of formats/structures. The moduleexcel2xml
is documented here.
CLI command excel2xml
The command line tool is used as follows:
dsp-tools excel2xml data-source.xlsx 1234 shortname
There are no flags/options for this command.
The Excel file must be structured as in this image:
Some notes:
- The special tags
<annotation>
,<link>
, and<region>
are represented as resources of restypeAnnotation
,LinkObj
, andRegion
. - The columns "ark" and "iri" are only used for DaSCH-internal data migration.