Commands Reference
dpm install
Download and install data packages listed in a descriptor file.
This command-line interface (CLI) function reads a TOML descriptor file that specifies data packages and their associated resources. It downloads the specified packages and saves them into a designated output directory. This functionality is particularly useful for managing datasets and ensuring that the necessary resources are readily available for use.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
descriptor
|
Path
|
The path to the TOML file containing the package descriptors. The default value is "data.toml". This file should define the sources from which the data packages will be downloaded. |
Path('data.toml')
|
output_dir
|
Path
|
The directory where the downloaded data packages and resources will be saved. The default is "datapackages". This directory will be created if it does not exist. |
Path('datapackages')
|
Example
To install data packages from a specified TOML descriptor file:
dpm install data.toml --output-dir datapackages
This command will read the data.toml
file, download the listed data
packages, and save them into the datapackages
folder.
dpm load
Load data packages into the database.
This command-line interface (CLI) function loads data packages defined in a TOML manifest file into a specified database. It ensures that any necessary control tables are created, checks for existing resources to determine if they need to be updated, and handles the loading of resources based on their checksums. If a specific package is provided, it will load only that package; otherwise, it will load all packages defined in the manifest.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
manifest
|
Path
|
The path to the TOML file that contains the data package descriptors. The default value is "data.toml". This file should list all the packages to be loaded into the database. |
Path('data.toml')
|
package
|
Path
|
The path to a specific data package (datapackage.json) to load. If provided, only this package will be loaded; otherwise, all packages from the manifest will be processed. |
None
|
Example
To load data packages defined in a manifest file:
dpm load data.toml
To load a specific data package:
dpm load data.toml --package datapackages/package_name/datapackage.json
This command will read the data.toml
file and load the specified
data package into the database, ensuring that control tables are
managed appropriately.
dpm concat
Concatenate resources from multiple data packages into a single CSV file.
This command-line interface (CLI) function allows the user to specify a pattern to match multiple data package files, or to provide a list of specific packages. The function concatenates resources from these packages based on the provided resource name(s) and optionally enriches the resulting DataFrame with additional identifier columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pattern
|
Optional[str]
|
A glob pattern to match data package filenames in the current directory. If provided, packages matching this pattern will be included in the concatenation process. |
None
|
package
|
list[str]
|
A list of specific data package filenames to include in the concatenation.
If both |
None
|
resource_name
|
list[str]
|
A list of resource names to concatenate from the specified packages. If not provided, the function will attempt to find common resource names across all packages. If there are no common resource names, a message will be printed and the function will exit. |
None
|
enrich
|
list[str]
|
A list of key-value pairs for enriching the DataFrame. Each pair should be in
the format "key=value", where |
None
|
output_dir
|
Path
|
The directory where the output CSV files will be saved. Defaults to 'data'. This directory will be created if it does not already exist. |
Path('data')
|
Example
To concatenate resources with the same name from all packages matching a pattern:
python script.py concat "*.json" --enrich "year=year"
This command will match all JSON files in the current directory, concatenate the
common resources, and add a new column year
populated from the data package
property year
.