Input Script Guide

Basics

Host and Nodes

Workflow Host

PFD-kit is built on the dflow package, which uses the Python API of ARGO workflows. While designed for cloud-based workflows with Kubernetes, a local “debug” mode is available for convenience. No cloud services are required for local execution.

To submit workflows to a remote server, specify the following configuration:

"dflow_config": {
    "host": "http://address.of.the.host:port"
},
"dflow_s3_config": {
    "endpoint": "address.of.the.s3.server:port"
},

For the Bohrium platform, use:

"bohrium_config": {
    "username": "your_username",
    "password": "your_password",
    "project_id": 123456,
    "_comment": "all"
},

The workflow will be hosted on https://workflows.deepmodeling.com, and progress can be tracked at https://workflows.deepmodeling.com/workflows.

Node Settings

The step_configs section defines computational resources for tasks like DFT calculations, model training, and MD exploration. For local execution, no configuration is needed, as all tasks run on the local machine.

For remote HPC nodes (e.g., Slurm systems), configure as follows:

"step_configs": {
    "run_fp_config": {
        "template_config": {},
        "executor": {
            "type": "dispatcher",
            "host": "your_host",
            "username": "your_username",
            "password": "your_password",
            "port": 22,
            "private_key_file": null,
            "remote_root": "/remote_root",
            "queue_name": "queue",
            "machine_dict": {
                "remote_profile": {
                    "timeout": 600
                }
            },
            "resources_dict": {
                "source_list": ["path_to_source_file"],
                "module_list": ["remote_module"],
                "custom_flags": ["custom_commands"]
            }
        },
        "template_slice_config": {
            "group_size": 1,
            "pool_size": 1
        }
    }
}

For Kubernetes services (e.g., Bohrium), specify the container image and machine type:

"step_configs": {
    "run_fp_config": {
        "template_config": {
            "image": "vasp_image_paths"
        },
        "continue_on_success_ratio": 0.9,
        "executor": {
            "type": "dispatcher",
            "image_pull_policy": "IfNotPresent",
            "machine_dict": {
                "batch_type": "Bohrium",
                "context_type": "Bohrium",
                "remote_profile": {
                    "input_data": {
                        "job_type": "container",
                        "platform": "ali",
                        "scass_type": "c32_m64_cpu"
                    }
                }
            }
        }
    }
}

Fine-Tuning

The example si_ft.json file defines workflow tasks. Specify the task type (e.g., “finetune”) and skip initial data generation and training if the Domains_SemiCond branch already provides sufficient accuracy:

"task": {
    "type": "finetune",
    "max_iter": 5,
    "init_ft": false,
    "init_train": false
}

The inputs section includes essential parameters and model files. Prepare exploration systems with proper perturbation in advance:

"inputs": {
    "base_model_path": "DPA2_medium_28_10M_beta4.pt",
    "init_confs": {
        "prefix": "./",
        "confs_paths": ["./pert_si32.extxyz"]
    },
    "init_fp_confs": {
        "prefix": "./",
        "confs_paths": []
    }
}

The exploration section defines structural configuration exploration. In this example, MD simulations at 1000 K under varying pressures generate new configurations:

"exploration": {
    "type": "ase",
    "config": {
        "calculator": "dp",
        "head": "Domains_SemiCond"
    },
    "stages": [
        [
            {
                "conf_idx": [0],
                "n_sample": 1,
                "ens": "npt",
                "dt": 2,
                "nsteps": 2000,
                "temps": [1000],
                "press": [1, 1000, 10000],
                "trj_freq": 10
            }
        ]
    ]
}

The select_confs node filters unphysical configurations and compresses data using entropy-based measures:

"select_confs": {
    "max_sel": 60,
    "frame_filter": [
        {"type": "distance"}
    ],
    "h_filter": {
        "chunk_size": 5,
        "_comment": "entropy-based filter"
    }
}

The fp section defines DFT calculation settings, including VASP input and pseudopotential files:

"fp": {
    "type": "vasp",
    "task_max": 50,
    "run_config": {
        "command": "mpirun -n 32 vasp_std"
    },
    "inputs_config": {
        "incar": "INCAR.fp",
        "pp_files": {
            "Si": "POTCAR"
        },
        "kspacing": 0.2
    }
}

The train section specifies the pretrained model and training configuration:

"train": {
    "type": "dp",
    "config": {
        "impl": "pytorch",
        "head": "Domains_SemiCond"
    },
    "template_script": "train.json"
}

The evaluate section tests the model against a test dataset. Iterations continue until convergence or the maximum cycle limit is reached. Convergence is achieved when the force RMSE falls below 0.06 eV/Å:

"evaluate": {
    "test_size": 0.3,
    "model": "dp",
    "head": "Domains_SemiCond",
    "_comment": "The percentage for test",
    "converge": {
        "type": "force_rmse",
        "RMSE": 0.06
    }
}

Distillation

The distillation script is similar to fine-tuning, with key differences. Change the task/type to dist and label new frames using the fine-tuned model:

"task": {
    "type": "dist",
    "max_iter": 5
},
"inputs": {
    "base_model_path": "path_to_teacher_model",
    ...
},
"train": {
    "type": "dp",
    "config": {
        "impl": "pytorch"
    },
    "template_script": "./dist_train.json"
},
"fp": {
    "type": "ase",
    "run_config": {
        "model_style": "dp",
        "inputs_config": {
            "batch_size": 500
        }
    }
}