Setting Parameters of RiD-kit
RiD-kit uses a JSON-format file (typically rid.json) to configure simulations. Here we explain these parameters one by one.
Overall Settings
name(str)is the name of this task.numb_walkers(int)is the number of parallel walkers to explore.ExplorationStep can be achieved bynumb_walkersparallel trajectories simultaneously. Data of these parallel walkers are collected together into to train neural networks.numb_iters(int)is the maximum number of iterations of RiD workflow. As it is very convenient to continue and rerun the RiD workflow, it does not really matter to set a accurate value. A recommended value is greater than 5, typically around 10 for the first attemption.trust_lvl_1(int)andtrust_lvl_2(int)(or $e_0$ and $e_1$ in published papers) are two thresholds to control the biased forces and select data. In biased simulation, the bias forces are tuned by model deviations: $$ F(r) = -\nabla_{r_i} U(r) + \sigma( \epsilon ( s( r))) \nabla_{r_i} A(r) \ \sigma(\epsilon)= \begin{cases} 1, & \epsilon<\epsilon_0 \ \frac{1}{2}+\frac{1}{2}\cos{(\pi \frac{\epsilon-\epsilon_0}{\epsilon_1-\epsilon_0})}, & \epsilon_0 <\epsilon < \epsilon_1 \ 0, &\epsilon > \epsilon_1 \end{cases} $$ In data selection, data will be collected if their model deviations are greater thantrust_lvl_1.In adaptive RiD version, these two values refer to the initial trust levels and will be adjusted according to the number of clusters during simulations.
init_models(List[str])are the initial guesses of neural networks. Usually we know nothing about the systems and[]is set to it.
Example
"name": "test",
"numb_walkers": 2,
"numb_iters": 20,
"trust_lvl_1": 2,
"trust_lvl_2": 3,
"init_models": [],
CV setting
This section configures collective variables (CVs). RiD-kit provides two modes to configure CV: "torsion" and "custom".
Torsion mode
In torsion mode, RiD-kit uses torsion (dihedral angles) of proteins as collective variables. Set "mode": "torsion". selected_resid, angular_mask and weights must not be none or empty if you use torsion mode.
selected_resid(List[int])residue ids (starting form 1) of selected residues. Two dihedral angles of each selected residue, $\phi$ and $\psi$, are used. Note that the first residue of a chain (N terminal) doesn’t have $\phi$ and the last residue of a chain (C terminal) doesn’t have $\psi$.angular_mask(List[int])the mask of augular (periodic) CVs, 1 for periodic and 0 for non-periodic. In torsion mode, all CVs are periodic, so a list filled by 1 with length equal to number of CVs should be set.weights(List[int])weights of CVs to scale their values. Used in clustering to calculate the Euclidean distances between CVs. This can prevent from CV discrimination if some CV’s range is much larger than another one.
Custom mode
RiD-kit also supports user-defined collective variables.
Set "mode": "custom". cv_file, angular_mask and weights must not be none or empty if you use custom mode.
"cv_file"(str)Path to a CV file. This file defines collective variables in PLUMED2 format. Technically, CV that PLUMED2 supports can be suporrted byRiD-kit."angular_mask"(List[int])the mask of augular (periodic) CVs, 1 for periodic and 0 for non-periodic. In custom mode, figure out the periodic CVs and set 1 at the corresponding location in list."weights"(List[int])the same as above.
Example
"CV": {
"mode": "torsion",
"selected_resid": [ 1, 2 ],
"angular_mask": [ 1, 1 ],
"weights": [ 1, 1 ],
"cv_file":""
},
ExploreMDConfig
This section configures the parameters in Exploration step. MD eigine is Gromacs patched by PLUMED2. Parameter convention follows Gromacs and PLUMED2.
nsteps(int)Number of steps of MD simulation in exploration step.output_freq(int)Frame output frequence of MD simulations. A recommended value isnsteps/1000to make sure at least 1000 frames generated during exploration.temperature(int)Temperature of MD simulations. Please make sure this value is the same as the temperature inLabelstep unless you are meant to keep them different.dt(int)Time interval of MD simulations. 0.002 is recommended for normal simulations. One may use a larger interval, e.g. 0.004, when heavy hydrogen modes in Gromacs.output_mode(str):Optional modes:"both", "single", "double", "none"."both": Generate both full presicion format.trrand compressed presision format.xtctrajectories during MD simulations."single"Only generate compressed presision format.xtctrajectories during MD simulations."double"Only generate full presicion format.trrtrajectories during MD simulations."none"Don’t generate trajectory files. (Used for tasks that only need PLUMED2 ouput.)
ntmpi(int)Number of thread-MPI ranks to start (0 is guess). See detail in Gromacs manual.nt(int)Total number of threads to start (0 is guess). See detail in Gromacs manual.max_warningMax warnings ingmx gromppsteps. See detail in Gromacs manual.
Example
"ExploreMDConfig": {
"nsteps": 25000,
"output_freq": 25,
"temperature": 300,
"dt": 0.002,
"output_mode": "both",
"ntmpi": 1,
"nt": 8,
"max_warning": 0
},
SelectorConfig
This section configures the parameters in Selection step. In Selection step, all CV values are clustered. Then data owning high model deviation are selected, collected and sent to Label step.
"cluster_threshold"(int)Initial guess of cluster threshold. Note: the real cluster threshold is generated from this guess.numb_cluster_lower(int)andnumb_cluster_upper(int)These two values form an closed interval[numb_cluster_lower, numb_cluster_upper]to make a proper cluster threshold. From the initial guess of cluster, threshold will be adjusted to let the number of clusters fall into the interval. This process only happens in the first iteration. The threshold will be fixed in the following iterations where thetrust levelwill be adjusted in adaptive version of RiD. See published paper for detail."max_selection"(int)The max selection number of clusters duringSelectionstep. If number of clusters is greater than this threshold, the firstmax_selectionth clusters will be selected.numb_cluster_threshold(int)If number of clusters of MD trajectories in exploration step at current interation is less than this value, the trust level will be adjusted. See published paper for detail. A recommended value is half ofnumb_cluster_lower.slice_mode(str)Optional values:"gmx"and"mdtraj".RiD-kitextracts selected frame from MD trajectorie.gmxmode uses, Gromacsgmx trjconvto slice trajectories,mdtrajmode usesmdtrajpython interface to slice trajectories. We highly recommed usinggmxmode due to known bugs (#Issue1514 ) frommdtrajof changing.grotopology names.
Example
"SelectorConfig": {
"cluster_threshold": 1,
"numb_cluster_lower": 16,
"numb_cluster_upper": 26,
"max_selection": 30,
"numb_cluster_threshold": 8,
"slice_mode": "gmx"
},
LabelMDConfig
This section configures the parameters in Label step. Most settings are quite similar to those in Exploration Step. In Label, RiD-kit performs restrained MD simulations, where hamonic restraints are exerted on CVs. These procedues need much shorter steps than Exploration.
kappas(List[int])A list of force constants ($\kappa$) of harmonic restraints. The length of the list is equal to the number of CVs.nsteps(int)Number of steps of MD simulation of restrained simulations.output_freq(int)Frame output frequence of MD simulations. A recommended value isnsteps/1000to make sure at least 1000 frames generated during exploration.temperature(int)Temperature of MD simulations. Please make sure this value is the same as the temperature inExplorationstep unless you are meant to keep them different.dt(int)Time interval of MD simulations. 0.002 is recommended for normal simulations. Tune it down if you use a very large harmonic force, otherwise numerical explosion may occur.output_mode(str):Optional modes:"both", "single", "double", "none"."both": Generate both full presicion format.trrand compressed presision format.xtctrajectories during MD simulations."single"Only generate compressed presision format.xtctrajectories during MD simulations."double"Only generate full presicion format.trrtrajectories during MD simulations."none"Don’t generate trajectory files. (Used for tasks that only need PLUMED2 ouput.)
ntmpi(int)Number of thread-MPI ranks to start (0 is guess). See detail in Gromacs manual.nt(int)Total number of threads to start (0 is guess). See detail in Gromacs manual.max_warningMax warnings ingmx gromppsteps. See detail in Gromacs manual.
Example
"LabelMDConfig": {
"kappas": [ 500, 500 ],
"nsteps": 25000,
"output_freq": 50,
"temperature": 300,
"dt": 0.002,
"output_mode": "both",
"ntmpi": 1,
"nt": 8,
"max_warning": 0
},
Train
This section configures the parameters in Train step. RiD-kit is based on Tensorflow.
numb_models(int)Number of models that are trained inTrainstep.RiD-kituses model deviations (or standrad deviation of output of these models) to evaluate the quality of free energy surface, sonumb_modelsmush be greater than 1.neurons(List[int])The number of neurons of each layer.RiD-kituses MLP as the basic neural network structure. Number of elements in list means the number of hidden layers and each element defines number of nodes in each layer. For example,[ 50, 50, 50, 50 ]means there are 4 hidden layers and each hidden layers has 50 neurons.resnet(bool)Wether to use residual connection between layers. Iftrue, the number of nodes of layers must be equal.epoches(int)Numebr of epoches.init_lr(float)Initial learning rate. It will decay exponentially during training.decay_steps(int)Decay steps of learning rate. See tensorflow api docs for detail.decay_rate(float)Decay rate of learning rate. See tensorflow api docs for detail.drop_out_rate(float)Dropout rate of dropout layers.numb_threads(int)Threads of training.
Example
"Train": {
"numb_models": 4,
"neurons": [ 50, 50, 50, 50 ],
"resnet": true,
"batch_size": 32,
"epoches": 2000,
"init_lr": 0.0008,
"decay_steps": 120,
"decay_rate": 0.96,
"drop_out_rate": 0.1,
"numb_threads": 8,
"use_mix": false,
"restart": false
}
A full Example of rid.json
You can find a full example of rid.json within "rid-kit/rid/template". Or you can copy one from following:
{
"name": "test",
"numb_walkers": 2,
"numb_iters": 20,
"trust_lvl_1": 2,
"trust_lvl_2": 3,
"init_models": [],
"CV": {
"mode": "torsion",
"selected_resid": [ 1, 2 ],
"angular_mask": [ 1, 1 ],
"weights": [ 1, 1 ],
"cv_file":""
},
"ExploreMDConfig": {
"nsteps": 25000,
"output_freq": 25,
"temperature": 300,
"dt": 0.002,
"output_mode": "both",
"ntmpi": 1,
"nt": 8,
"max_warning": 0
},
"SelectorConfig": {
"numb_cluster_lower": 16,
"numb_cluster_upper": 26,
"cluster_threshold": 1,
"max_selection": 30,
"numb_cluster_threshold": 8,
"slice_mode": "gmx"
},
"LabelMDConfig": {
"nsteps": 25000,
"output_freq": 50,
"temperature": 300,
"dt": 0.002,
"output_mode": "both",
"ntmpi": 1,
"nt": 8,
"max_warning": 0,
"kappas": [ 500, 500 ]
},
"Train": {
"numb_models": 4,
"neurons": [ 50, 50, 50, 50 ],
"resnet": true,
"batch_size": 32,
"epoches": 2000,
"init_lr": 0.0008,
"decay_steps": 120,
"decay_rate": 0.96,
"drop_out_rate": 0.1,
"numb_threads": 8,
"use_mix": false,
"restart": false,
}
}