pfd.op package

Submodules

pfd.op.collect module

class pfd.op.collect.CollectData(*args, **kwargs)[source]

Bases: OP

Collect and process molecular systems data for machine learning workflows.

This operation aggregates multiple atomic systems, applies optional sampling, and converts them to dpdata.MultiSystems format for downstream ML training. Supports both labeled and unlabeled data with optional train/test splitting.

Examples

>>> collector = CollectData()
>>> result = collector.execute({
...     "systems": [Path("system1"), Path("system2")],
...     "type_map": ["H", "O"],
...     "optional_parameters": {"test_size": 0.2}
... })
execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:

ip (dict) – Input dict with components: - structures : (Artifact(List[Path])) configurations collected in this iteration - pre_structures : (Artifact(Path), optional) A single extxyz file, configurations collected in previous iterations at the CURRENT stage

Returns:

op – Output dict with components: - task_names: (List[str]) The name of tasks. Will be used as the identities of the tasks. The names of different tasks are different. - task_paths: (Artifact(List[Path])) The parepared working paths of the tasks. Contains all input files needed to start the LAMMPS simulation. The order fo the Paths should be consistent with op[“task_names”]

Return type:

dict

classmethod get_input_sign()[source]

Get the input signature for the operation. Returns: ———–

OPIOSign: The input signature.

classmethod get_output_sign()[source]

Get the signature of the outputs

pfd.op.converge module

pfd.op.inference module

pfd.op.model_test module

class pfd.op.model_test.ModelTestOP(*args, **kwargs)[source]

Bases: OP

execute(ip: OPIO) OPIO[source]

Run the OP

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

pfd.op.pert_gen module

pfd.op.select_confs module

class pfd.op.select_confs.SelectConfs(*args, **kwargs)[source]

Bases: OP

Select configurations from exploration trajectories for labeling.

execute(ip: OPIO) OPIO[source]

Execute the OP.

Parameters:

ip (dict) –

Input dict with components:

  • conf_selector: (ConfSelector) Configuration selector.

  • confs: (List[str]) The exploration trajectories.

  • init_confs: (Artifact(List[Path])) The initial configurations.

  • pre_confs: (Artifact(List[Path])) The trajectories generated in the exploration.

  • optional_parameters: (Dict) The optional parameters

Returns:

Output dict with components: - confa: (Artifact(Path)) The selected configurations.

Return type:

Any

filter_by_entropy(iter_confs: List[Atoms], reference: List[Atoms] = [], chunk_size: int = 10, k=32, cutoff=5.0, batch_size: int = 1000, h=0.015, max_sel: int = 100, **kwargs) List[Atoms][source]

Iteratively select configurations for maximum entropy.

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

pfd.op.stage module

class pfd.op.stage.StageScheduler[source]

Bases: OP, ABC

execute(ip: OPIO) OPIO[source]

Generate exploration tasks based on model and exploration styles

classmethod get_input_sign()[source]

Get the signature of the inputs

classmethod get_output_sign()[source]

Get the signature of the outputs

abstractmethod schedule(scheduler: Scheduler, *args, **kwargs)[source]

Schedule the exploration tasks.

class pfd.op.stage.StageSchedulerDist[source]

Bases: StageScheduler

schedule(scheduler: Scheduler, init_model: Path | None = None, current_model: Path | None = None, expl_model: Path | None = None, **kwargs) Tuple[Path | None, Path | None, Path | None][source]

Schedule the exploration tasks in distributed mode.

class pfd.op.stage.StageSchedulerFT[source]

Bases: StageScheduler

schedule(scheduler: Scheduler, init_model: Path | None = None, current_model: Path | None = None, expl_model: Path | None = None, **kwargs) Tuple[Path | None, Path | None, Path | None][source]

Schedule the exploration tasks in distributed mode.

pfd.op.task_gen module

Module contents