User data in the user's home directory
DSP-TOOLS saves user data in the user's home directory,
in the folder .dsp-tools
.
Here is an overview of its structure:
file/folder | command using it | description |
---|---|---|
xmluploads | xmlupload |
saves id2iri mappings and error reports |
docker | start-stack |
files necessary to startup Docker containers |
rosetta | rosetta |
a clone of the rosetta test project |
logging.log, logging.log.1 | several ones | These two grow up to 3MB, then the oldest entries are deleted |
fast-xmlupload | fast xmlupload | shell script for local processing |
Remark: Docker is normally not able to access files
stored in the site-packages
of a Python installation.
Therefore, it's necessary to copy the "docker" folder
to the user's home directory.
How to ship data files to the user
Accessing non-Python files (a.k.a. resources, a.k.a data files) in the code needs special attention.
Firstly, the build tool must be told to include this folder/files in the distribution.
In our case, this happens in [tool.poetry.include]
in the pyproject.toml
file.
Secondly, when accessing the files on the customer's machine,
the files inside site-packages
should be read-only
to avoid a series of common problems
(e.g. when multiple users share a common Python installation,
when the package is loaded from a zip file,
or when multiple instances of a Python application run in parallel).
Thirdly, the files can neither be accessed with a relative path from the referencing file, nor with a path relative to the root of the project.
For example, if you have a structure like this:
dsp-tools
├── pyproject.toml
└── src
└── dsp_tools
├── schemas
│ └── data.xsd
├── __init__.py
└── dsp_tools.py
it is not possible to do one of the following in dsp_tools/dsp_tools.py:
with open('schemas/data.xsd') as data_file:
...
with open('src/dsp_tools/resources/schema/data.xsd') as data_file:
...
The reason why these two approaches fail is that the working directory on the user's machine is determined by the directory where DSP-TOOLS is called from - not the directory where the distribution files are situated in.
To circumvent this problem,
it was once common to manipulate a package’s __file__
attribute
in order to find the location of data files:
import os
data_path = os.path.join(os.path.dirname(__file__), 'schemas', 'data.xsd')
with open(data_path) as data_file:
...
However, this manipulation isn’t compatible with PEP 302-based import hooks, including importing from zip files and Python Eggs.
The canonical way is to use importlib.resources:
from importlib.resources import files
# address "schemas" directory in module syntax: needs __init__.py
data_text = files('dsp_tools.resources.schema').joinpath('data.xsd').read_text()
# avoid module syntax when addressing "schemas" directory: no __init__.py necessary
data_text = files('dsp_tools').joinpath('resources/schema/data.xsd').read_text()
Note that depending on how the directory is addressed,
an __init__.py
file is necessary or can be omitted.
The information on this page is mainly based upon:
- https://stackoverflow.com/a/20885799/14414188
- https://stackoverflow.com/a/58941536/14414188
- https://setuptools.pypa.io/en/latest/userguide/datafiles.html#accessing-data-files-at-runtime