Skip to content

Commands Reference

dpm install

Download and install data packages listed in a descriptor file.

This command-line interface (CLI) function reads a TOML descriptor file that specifies data packages and their associated resources. It downloads the specified packages and saves them into a designated output directory. This functionality is particularly useful for managing datasets and ensuring that the necessary resources are readily available for use.

Parameters:

Name Type Description Default
descriptor Path

The path to the TOML file containing the package descriptors. The default value is "data.toml". This file should define the sources from which the data packages will be downloaded.

Path('data.toml')
output_dir Path

The directory where the downloaded data packages and resources will be saved. The default is "datapackages". This directory will be created if it does not exist.

Path('datapackages')
Example

To install data packages from a specified TOML descriptor file:

dpm install data.toml --output-dir datapackages

This command will read the data.toml file, download the listed data packages, and save them into the datapackages folder.

dpm load

Load data packages into the database.

This command-line interface (CLI) function loads data packages defined in a TOML manifest file into a specified database. It ensures that any necessary control tables are created, checks for existing resources to determine if they need to be updated, and handles the loading of resources based on their checksums. If a specific package is provided, it will load only that package; otherwise, it will load all packages defined in the manifest.

Parameters:

Name Type Description Default
manifest Path

The path to the TOML file that contains the data package descriptors. The default value is "data.toml". This file should list all the packages to be loaded into the database.

Path('data.toml')
package Path

The path to a specific data package (datapackage.json) to load. If provided, only this package will be loaded; otherwise, all packages from the manifest will be processed.

None
Example

To load data packages defined in a manifest file:

dpm load data.toml

To load a specific data package:

dpm load data.toml --package datapackages/package_name/datapackage.json

This command will read the data.toml file and load the specified data package into the database, ensuring that control tables are managed appropriately.

dpm concat

Concatenate resources from multiple data packages into a single CSV file.

This command-line interface (CLI) function allows the user to specify a pattern to match multiple data package files, or to provide a list of specific packages. The function concatenates resources from these packages based on the provided resource name(s) and optionally enriches the resulting DataFrame with additional identifier columns.

Parameters:

Name Type Description Default
pattern Optional[str]

A glob pattern to match data package filenames in the current directory. If provided, packages matching this pattern will be included in the concatenation process.

None
package list[str]

A list of specific data package filenames to include in the concatenation. If both pattern and package are provided, package will be included in the result alongside those matched by pattern.

None
resource_name list[str]

A list of resource names to concatenate from the specified packages. If not provided, the function will attempt to find common resource names across all packages. If there are no common resource names, a message will be printed and the function will exit.

None
enrich list[str]

A list of key-value pairs for enriching the DataFrame. Each pair should be in the format "key=value", where key is the name of the new column to create, and value is the property from the data package to use for populating that column. This option is used to add additional identifier columns to the concatenated output.

None
output_dir Path

The directory where the output CSV files will be saved. Defaults to 'data'. This directory will be created if it does not already exist.

Path('data')
Example

To concatenate resources with the same name from all packages matching a pattern:

python script.py concat "*.json" --enrich "year=year"

This command will match all JSON files in the current directory, concatenate the common resources, and add a new column year populated from the data package property year.