Helpers
create_label_to_name_list_node_mapping
Often, data sources contain list values named after the "label" of the JSON project list node, instead of the "name"
which is needed for the dsp-tools xmlupload
.
To create a correct XML, you need a dictionary that maps the "labels" to their correct "names".
Parameters:
Name | Type | Description | Default |
---|---|---|---|
project_json_path
|
str
|
path to a JSON project file (a.k.a. ontology) |
required |
list_name
|
str
|
name of a list in the JSON project |
required |
language_of_label
|
str
|
which language of the label to choose |
required |
Returns:
Type | Description |
---|---|
dict[str, str]
|
a dictionary of the form {label: name} |
Examples:
"lists": [
{
"name": "listName",
"labels": {
"en": "List",
"de": "Liste"
},
"comments": { ... },
"nodes": [
{
"name": "n1",
"labels": {
"en": "Node 1",
"de": "Knoten 1"
}
},
{
"name": "n2",
"labels": {
"en": "Node 2",
"de": "Knoten 2"
}
}
]
}
]
result = xmllib.create_label_to_name_list_node_mapping(
project_json_path="project.json",
list_name="listName",
language_of_label="de",
)
# result == {"Knoten 1": "n1", "knoten 1": "n1", "Knoten 2": "n2", "knoten 2": "n2"}
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 |
|
escape_reserved_xml_characters
From richtext strings (encoding="xml"), escape the reserved characters <
, >
and &
,
but only if they are not part of a standard standoff tag or escape sequence.
See the documentation for the standard standoff tags allowed by DSP-API, which will not be escaped.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
the richtext string to be escaped |
required |
Returns:
Type | Description |
---|---|
str
|
The escaped richtext string |
Examples:
result = xmllib.escape_reserved_xml_characters("Text <unknownTag>")
# result == "Text <unknownTag>"
result = xmllib.escape_reserved_xml_characters("Text <br/> text after")
# result == "Text <br/> text after"
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
|
find_date_in_string
Checks if a string contains a date value (single date, or date range), and returns the first found date as DSP-formatted string, see XML documentation for details Returns None if no date was found.
Notes
- All dates are interpreted in the Christian era and the Gregorian calendar.
- BC dates are only supported in French notation (e.g. 1000-900 av. J.-C.).
- The years 0000-2999 are supported, in 3/4-digit form.
- Dates written with slashes are always interpreted in a European manner: 5/11/2021 is the 5th of November.
- In the European notation, 2-digit years are expanded to 4 digits, with the current year as watershed:
- 30.4.24 -> 30.04.2024
- 30.4.25 -> 30.04.1925
Currently supported date formats
- 0476-09-04 -> GREGORIAN:CE:0476-09-04:CE:0476-09-04
- 0476_09_04 -> GREGORIAN:CE:0476-09-04:CE:0476-09-04
- 30.4.2021 -> GREGORIAN:CE:2021-04-30:CE:2021-04-30
- 30.4.21 -> GREGORIAN:CE:2021-04-30:CE:2021-04-30
- 5/11/2021 -> GREGORIAN:CE:2021-11-05:CE:2021-11-05
- Jan 26, 1993 -> GREGORIAN:CE:1993-01-26:CE:1993-01-26
- 28.2.-1.12.1515 -> GREGORIAN:CE:1515-02-28:CE:1515-12-01
- 25.-26.2.0800 -> GREGORIAN:CE:0800-02-25:CE:0800-02-26
- 1.9.2022-3.1.2024 -> GREGORIAN:CE:2022-09-01:CE:2024-01-03
- 1848 -> GREGORIAN:CE:1848:CE:1848
- 1849/1850 -> GREGORIAN:CE:1849:CE:1850
- 1849/50 -> GREGORIAN:CE:1849:CE:1850
- 1845-50 -> GREGORIAN:CE:1845:CE:1850
- 840-50 -> GREGORIAN:CE:840:CE:850
- 840-1 -> GREGORIAN:CE:840:CE:841
- 1000-900 av. J.-C. -> GREGORIAN:BC:1000:BC:900
- 45 av. J.-C. -> GREGORIAN:BC:45:BC:45
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
str
|
string to check |
required |
Returns:
Type | Description |
---|---|
str | None
|
DSP-formatted date string, or None |
Examples:
result = xmllib.find_date_in_string("1849/1850")
# result == "GREGORIAN:CE:1849:CE:1850"
result = xmllib.find_date_in_string("not a valid date")
# result == None
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
|
make_xsd_compatible_id
An xsd:ID may not contain all types of special characters,
and it must start with a letter or underscore.
Replace illegal characters with _
, and prepend a leading _
if necessary.
The string must contain at least one Unicode letter (matching the regex \p{L}
),
_
, !
, ?
, or number, but must not be None
, <NA>
, N/A
, or -
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_value
|
str | float | int
|
input value |
required |
Returns:
Type | Description |
---|---|
str
|
An xsd ID compatible string based on the input value |
Examples:
result = xmllib.make_xsd_compatible_id("0_Universität_Basel")
# result == "_0_Universit_t_Basel"
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 |
|
make_xsd_compatible_id_with_uuid
An xsd:ID may not contain all types of special characters,
and it must start with a letter or underscore.
Replace illegal characters with _
, and prepend a leading _
if necessary.
Additionally, add a UUID at the end.
The UUID will be different each time the function is called.
The string must contain at least one Unicode letter (matching the regex \p{L}
),
_
, !
, ?
, or number, but must not be None
, <NA>
, N/A
, or -
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_value
|
str | float | int
|
input value |
required |
Returns:
Type | Description |
---|---|
str
|
an xsd ID based on the input value, with a UUID attached. |
Examples:
result = xmllib.make_xsd_compatible_id_with_uuid("Universität_Basel")
# result == "Universit_t_Basel_88f5cd0b-f333-4174-9030-65900b17773d"
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 |
|
create_list_from_string
Creates a list from a string. Trailing and leading whitespaces are removed from the list items.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
str
|
input string |
required |
separator
|
str
|
The character that separates the different values in the string. For example, a comma or newline. |
required |
Returns:
Type | Description |
---|---|
list[str]
|
The list that results from splitting the input string. If the original string is empty or consists only of whitespace characters, the resulting list will be empty. |
Examples:
result = xmllib.create_non_empty_list_from_string(" One/ Two\n/", "/")
# result == ["One", "Two"]
result = xmllib.create_list_from_string(" \n ", "\n")
# result == []
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 |
|
create_non_empty_list_from_string
Creates a list from a string. Trailing and leading whitespaces are removed from the list items.
If the resulting list is empty it will raise an InputError
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string
|
str
|
input string |
required |
separator
|
str
|
The character that separates the different values in the string. For example, a comma or newline. |
required |
resource_id
|
str | None
|
If the ID of the resource is provided, a better error message can be composed |
None
|
prop_name
|
str | None
|
If the name of the property is provided, a better error message can be composed |
None
|
Returns:
Type | Description |
---|---|
list[str]
|
The list that results from splitting the input string. |
Examples:
result = xmllib.create_non_empty_list_from_string("One\nTwo ", "\n")
# result == ["One", "Two"]
result = xmllib.create_non_empty_list_from_string(" \n/ ", "/")
# raises InputError
Source code in dsp/dsp-tools/src/dsp_tools/xmllib/helpers.py
513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 |
|