XML File Parsing and Processing for xmlupload and validate-data
Parsing XML Files and Transformations
The ParsedResource is the internal Python representation of our data XML file.
In order to keep breaking changes minimal when changes to the XML are required 
all processes that interact with the XML file should use the ParsedResource as its starting point.
Overview
---
title: Overview of Code Flow for xmlupload and validate-data
---
stateDiagram-v2
    state "etree Root" as eroot1
    state "etree Root" as eroot2
    state "etree Root" as eroot3
    state "Root Modifications" as rootwork
    state "etree Into Python Representation" as pywork
    state "CLI: xmlupload" as processed
    state "ParsedResource" as parsedres1
    state "ParsedResource" as parsedres2
    state "ParsedResource" as parsedres3
    state "ProcessedResource" as Processedres
    state "CLI: validate-data" as valdata
    [*] --> eroot1
    state rootwork {
        eroot1 --> eroot2: xsd validation success
        eroot1 --> [*]: xsd validation error raised
    }
    eroot2 --> pywork
    state pywork {
        eroot3 --> parsedres1: processed representation
    }
    pywork --> processed
    state processed {
        valdata --> [*]: validation failure
        valdata --> parsedres2: validation success
        parsedres2 --> Processedres: resolve data
    }
    pywork --> valdata
    state valdata {
        parsedres3 --> RdfLikeResource: processing
    }Parsing XML Files in Detail
---
title: Parsing of XML File and Transformations Into ParsedResource
---
stateDiagram-v2
    state "Transform etree Root" as eroot
    state "Transform etree Root Into Python Representation" as transpy
    state "Resource etree" as resetree1
    state "ParsedResource" as respars
    state "Value etree" as valtree
    state "IIIF/bitstream etree" as filetree
    state "ParsedValue" as valpars
    state "ParsedFileValue" as filepars
    state "Transform Value" as transval
    state "Transform FileValue" as transfile
    [*] --> r1: Parse file
    r4 --> transpy
    state eroot {
        state "etree root" as r1
        state "etree root" as r2
        state "etree root" as r3
        state "etree root" as r4
        r1 --> r2: Remove Comments
        r2 --> [*]: xsd validation failure
        r2 --> r3: xsd validation success
        r3 --> r4: make localnames
    }
    state transpy {
        resetree1 --> transval: extract values
        resetree1 --> transfile: extract file or iiif
        resetree1 --> respars: transform special tags<br/><br/>resolve prefixes into absolute IRIs
        state transval {
            valtree --> valpars: transform special tags<br/><br/>resolve prefixes into absolute IRIs<br/><br/>resolve value type to Python Class<br/><br/>strip whitespaces from content
        }
        transval --> respars: add to resource
        state transfile {
            filetree --> filepars: resolve file type<br/><br/>strip whitespaces from content
        }
        transfile --> respars: add to resource
    }validate-data
validate-data has a specific process as in some cases 
the success of a previous step is required for subsequent validations.
Validation Process
stateDiagram-v2
    state "XSD validation" as XSD
    state "<b>STOP<b>" as stopXSD
    state "Check for Unknown Classes<br>(Python Logic)" as unknownCls
    state "<b>STOP<b>" as stopUnknown
    state "Ontology Validation<br>(SHACL-CLI)" as ontoVal
    state "<b>STOP<b>" as ontoViolation
    state "flag <em>--ignore-duplicate-files-warning<em>" as ignoreF
    state "Check for Duplicate Filepaths<br>(Python Logic)" as duplicFile
    state "severity: WARNING" as warning
    state "severity: INFO" as info
    state "severity: ERROR" as err
    state "Data Validation<br>(SHACL-CLI)" as dataSH
    [*] --> XSD
    XSD --> stopXSD: validation failure
    XSD --> unknownCls: success
    unknownCls --> stopUnknown: unknown found
    unknownCls --> ontoVal: success
    ontoVal --> ontoViolation: violations found
    ontoVal --> ignoreF: success
    ignoreF --> dataSH: present
    ignoreF --> duplicFile: not present
    duplicFile --> dataSH: continue
    duplicFile --> warning: duplicates found
    dataSH --> info: problems
    dataSH --> err: problems
    dataSH --> warning: problemsDetermine Validation Success
The validation success, i.e. if an xmlupload would be possible and is allowed to continue, is dependent on the server.
Some validation problems are allowed on test environments (including localhost), while the "prod-like" servers are stricter. Prod like servers include prod, ls-prod, stage, and rdu-stage.
| TEST ENVIRONMENTS | PROD-LIKE ENVIRONMENTS | |
|---|---|---|
| INFO | success | success | 
| WARNING | success | failure | 
| ERROR | failure | failure | 
xmlupload
The xmlupload requires validation and transformation before we can send an upload request.
The Python class ProcessedResource has a completely correct state, e.g. all the types are correct, etc.
and is used to create the graph for the upload.
From ParsedResource to ProcessedResource
---
title: Transformations from ParsedResource for xmlupload
---
stateDiagram-v2
    state "Process Resource" as processedres
    state "Process Value" as processedationval
    state "Process FileValues" as processedfile
    state "ParsedValue" as parsedval
    state "ParsedResource" as parsedres
    state "ProcessedValue" as valdes
    state "Continue" as cont
    parsedres --> processedfile
    parsedres --> processedationval
    parsedres --> processedres
    state processedres {
        ParsedResource --> Permissions: resolve permissions
    }
    state processedationval {
        parsedval --> valdes: resolve permissions<br/><br/>resolve listnodes to IRIs
    }
    state processedfile {
        ParsedFileValue --> ProcessedFileValue: resolve permissions<br/><br/>resolve metadata
    }
    processedres --> ProcessedResource: return result
    processedationval --> ProcessedResource: return result
    processedfile --> ProcessedResource: return result
    ProcessedResource --> cont: success
    ProcessedResource --> [*]: unexpected transformation failure