High-energy structure search

Table of contents

Introduction

After training the models, we can finally start searching for high-error structures in our system. This is done by running molecular dynamics (MD) trajectories using forces from one of the trained models searching for potential energy surface holes across the trajectories. An excellent environment to run such simulations is NQCDynamics.jl within Julia.

In order to run the dynamics, we usually create the initial conditions (initial positions and velocities) separately, in order to be able to access it multiple times, if needed. This part is very well explained in the NQCDynamics.jl documentation. Our initial conditions scripts for dissociative chemisorption included in this repository could be also helpful (link1, link2).

Having both the ML-based PES models and a set of initial conditions, we can finally run MD with a high-error structure search. Below we will show an example script, which can also be accessed directly here.

The following sections will include Julia-based code.

Importing packages

First, we import Julia-based and Python-based (PyCall) packages.

using NQCDynamics
using NQCDynamics.InitialConditions.QuantisedDiatomic
using PyCall
using DelimitedFiles
using Unitful
using UnitfulAtomic
using SciMLBase
using LinearAlgebra
using Statistics
using JLD2

# Importing Python modules with PyCall
io = pyimport("ase.io")
db = pyimport("ase.db")
spk_utils = pyimport("schnetpack.utils")
spk_interfaces = pyimport("schnetpack.interfaces")

Initial structures and functions

Next, we define a couple of functions and structures that will allow us to create certain MD callbacks or to analyze our results.

Trajectory terminator

Below, we define an MD trajectory terminator, which stops the trajectory when certain conditions are met. Here, the trajectory is stopped if H2 is above scat_cutoff value or if the distance between hydrogens in H2 exceeds dist_cutoff value.

mutable struct TrajectoryTerminator
    h2_indices
    scat_cutoff
    dist_cutoff
    slab_range_dif
    slab_ranges
    n_atoms_layer
end
function (term::TrajectoryTerminator)(u, t, integrator)::Bool
    R = NQCDynamics.get_positions(u)
    com_h2_z = minimum(R[3,term.h2_indices[1]:term.h2_indices[2]])
    top_surface_avg_z = sum(R[3,end-Int(term.n_atoms_layer)-1:end-2])/term.n_atoms_layer

    if (com_h2_z - top_surface_avg_z) > term.scat_cutoff # [scat_cutoff] Ang above the surface
        return true
    elseif norm(R[:,term.h2_indices[1]] .- R[:,term.h2_indices[2]]) > term.dist_cutoff # at least [dist_cutoff] Ang between H2 = reaction
        return true
    else
        return false
    end
end

High-error structure saver

Structure OutputTrajectory with a function that is executed at the end of every trajectory. It allows calculating energies with all the models from the models_mult list. After that, the error (standard deviation) is calculated and compared to the manually chosen v_models_error, which is the minimum error required for the structure to be saved.

If the calculated error is higher than v_models_error, the structure is saved to the db_out database stored in a specified path.

In order to save memory, only the last structure is returned by our output function.

struct OutputTrajectory 
    atoms
    cell
    path
    surface
    e_tran
    models_mult
    selection_start
    v_models_error
    save_errors
end

function (out::OutputTrajectory)(sol, i)
    if out.save_errors == true
        atoms_all = []
        errors = []
        Vs = []
        cur_error = 0
        db_out = db.connect("$(out.path)output_h2$(out.surface)_E$(out.e_tran).db")

        for (n, e) in enumerate(sol.u)
            for (i1, m) in enumerate(out.models_mult)
                append!(Vs, potential(out.models_mult[i1], NQCDynamics.get_positions(e)))
            end
            cur_error = std(Vs)
            if cur_error >= out.v_models_error
                append!(errors, cur_error)
                cur_ase_atoms = NQCDynamics.convert_to_ase_atoms(out.atoms, NQCDynamics.get_positions(e), out.cell)
                append!(atoms_all, [cur_ase_atoms])
                data=Dict()
                data["surface"] = out.surface
                data["traj_id"] = out.selection_start + i - 1
                data["atoms_id"] = n
                data["error"] = cur_error
                data["e_transl"] = out.e_tran
                db_out.write(cur_ase_atoms, data=data)
            end
        end
    end

    return last(sol.u) # return last structure
end

Postprocessing

Function ensemble_processing is used after MD simulation is over, to postprocess our simulation data. In this case of simulating dissociative chemisorption of H2 on Cu surface, we iterate over all the trajectories (the last structure of every trajectory), to check whether the trajectory ended up with a scattering of H2 molecule from the surface or with the reaction (sticking), to be able to calculate sticking probability in further steps.

function ensemble_processing(ensemble, dist_cutoff, scat_cutoff, atoms, cell, surface, e_tran, cur_folder, n_atoms_layer)
    atoms_all = []
    output_data = []
    n_scat = 0
    n_reac = 0
    n_nondef = 0
    h2_distance = 0
    h2_surface_distance = 0
    scattering = 0
    reaction = 0
    non_defined = 0

    db_out = db.connect("$(cur_folder)output_last_str_h2$(surface)_E$(e_tran).db")

    for (n, e) in enumerate(ensemble) # for each traj
        e = e[:OutputTrajectory]
        scattering = 0
        reaction = 0
        non_defined = 0
        h2_distance = norm(NQCDynamics.get_positions(e)[:,end].-NQCDynamics.get_positions(e)[:,end-1])
        top_surface_avg_z = sum(NQCDynamics.get_positions(e)[3,end-Int(n_atoms_layer)-1:end-2])/n_atoms_layer
        h2_surface_distance = minimum(NQCDynamics.get_positions(e)[3,end-1:end]) - top_surface_avg_z # min H2 z - max Cu z value
        cur_ase_atoms = NQCDynamics.convert_to_ase_atoms(atoms, NQCDynamics.get_positions(e), cell)

        if h2_surface_distance >= scat_cutoff  # Scattering event
            n_scat += 1
            scattering = 1
        elseif h2_distance >= dist_cutoff            # DC event
            n_reac += 1
            reaction = 1
        else                              # Non-defined category
            n_nondef += 1
            non_defined = 1
        end

        data=Dict()
        data["surface"] = surface
        data["scattering"] = scattering
        data["reaction"] = reaction
        data["non_defined"] = non_defined
        db_out.write(cur_ase_atoms, data=data)
    end
    return output_data, atoms_all, n_scat, n_reac, n_nondef
end

Model initializer

Function schnet_model_pes is used for creating an NQCDynamics model object utilizing the ASE calculator (here SchNet calculator).

This function can be replaced with any MLIP that can be accessed through the ASE calculator.

function schnet_model_pes(model_path, cur_atoms)
    spk_model = spk_utils.load_model(model_path;map_location="cpu")
    model_args = spk_utils.read_from_json("$(model_path)/args.json")
    environment_provider = spk_utils.script_utils.settings.get_environment_provider(model_args,device="cpu")
    calculator = spk_interfaces.SpkCalculator(spk_model, energy="energy", forces="forces", environment_provider=environment_provider)
    cur_atoms.set_calculator(calculator)
    model = AdiabaticASEModel(cur_atoms)

    return model
end

Simulation settings

Now, we set all the required paths and we create the output folders.

ml_model_f = "path/to/mlip/model"
icond_f = "path/to/initial/conditions/folder"
icond_str = "path/to/initial/structures/folder/"
output_f = "path/to/output/folder/"
mkpath(output_f)

We then choose settings for high-error structures saver (adaptive sampling).

save_errors = true # choose if you want to save structures with min error of v_models_error for adaptive sampling 
v_models_error = 0.025 # minimum error (std) of potential energy predictions made by multiple models, of the structure that will be saved for adaptive sampling
models_mult = []
models_mult_paths = [] # this can be done in a couple of ways, but e.g. add paths to different models here
atoms_all = []

Next, we set all the simulation details.

height = 7.0 # H height / Ang 
traj_start = parse(Int64, ENV["TRAJ_START"]) # starting trajectory (based on initial conditions)
traj_end = parse(Int64, ENV["TRAJ_END"]) # last trajectory (based on initial conditions)
traj_num = traj_end - traj_start + 1 # number of trajectories to run
max_time_fs = 3000 # simulation time / fs
max_time = max_time_fs*u"fs"
step = 0.1u"fs" # simulation step
scat_cutoff = ang_to_au(7.1) # termination condition (H2 height)
dist_cutoff = ang_to_au(2.25) # termination condition (H2 bond length)
slab_outside_max_len = ang_to_au(10)
surface = "cu111" # surface
e_tran = 0.5 # translational/collision energy in eV
vjs = [0,1] # v and J states
temp_surf = "925" # temperature of surface
n_layers_metal = 6

cur_folder = "$(output_f)$(surface)/v$(vjs[1])J$(vjs[2])/T$(temp_surf)/"
mkpath(cur_folder)

Reading the initial conditions files

file_name = "Et$(e_tran)_v$(vjs[1])_J$(vjs[2])/distr_Et$(e_tran)_v$(vjs[1])_J$(vjs[2])"
system = load("$(icond_f)$(surface)/T$(temp_surf)K/$(file_name).jld2")
ase_atoms = io.read("$(icond_str)$(surface)_h$(height)_full_$(temp_surf)K.in")
cell = system["cell"]
distribution = system["dist"]
atoms = system["atoms"]
atoms = NQCDynamics.Atoms(atoms.types)
n_atoms_layer = (length(ase_atoms)-2)/n_layers_metal # number of atoms in a metal layer (if the molecule contains 2 atoms)

Preparing the simulation

Loading models

We first load models for high error structure search (adaptive sampling). To do that, we load all of the models with paths included in the models_mult_paths list, using the previously mentioned schnet_model_pes function.

if save_errors == true
    for (i, m) in enumerate(models_mult_paths) # 
        append!(models_mult, [schnet_model_pes(models_mult_paths[i], ase_slab.copy())])
    end
end

Then, we load a model used for MD and we initialize the Simulation.

model = schnet_model_pes(ml_model_f, ase_atoms)
sim = Simulation{Classical}(atoms, model, cell=cell)

Set trajectory terminator

In order to terminate the trajectories whenever the simulation has reached its goal, we set the terminating callback, which will be executed at every step of the simulation.

terminator = TrajectoryTerminator([length(ase_atoms)-1,length(ase_atoms)], scat_cutoff, dist_cutoff, slab_outside_max_len, 
                        [[ang_to_au(0.0), cell.vectors[1,1] + cell.vectors[1,2]], [ang_to_au(0.0), cell.vectors[2,2]], [ang_to_au(0.0), ang_to_au(15.0)]],Int(n_atoms_layer))

terminate_cb = DynamicsUtils.TerminatingCallback(terminator)

Running the ensemble MD simulation

Finally, we use all the model parameters established in the previous steps and we run the ensemble dynamics using the Ensembles.run_dynamics function.

All the trajectories will finish either after reaching the maximum simulation time max_time_fs or whenever the callback function terminate_cb is satisfied.

ensemble = Ensembles.run_dynamics(sim, (0.0, max_time), distribution; selection=traj_start:traj_end, dt=step, trajectories=traj_num, 
                        output=OutputTrajectory(atoms, cell, cur_folder, surface, string(e_tran), models_mult, traj_start, v_models_error, save_errors),
                        callback=terminate_cb, ensemble_algorithm=EnsembleDistributed(), saveat=(0.0:austrip(1.0*u"fs"):austrip(max_time_fs*u"fs")))

Postprocess

After the simulation is over, we execute the ensemble_processing function that allows us to calculate our final reaction (sticking) probability prob_reac_all.

output_data, atoms_all, n_scat, n_reac, n_nondef = ensemble_processing(ensemble, dist_cutoff, scat_cutoff, atoms, cell, surface, e_tran, cur_folder, Int(n_atoms_layer))

n_traj_all = n_scat + n_reac + n_nondef
prob_reac_all = n_reac/n_traj_all

We end the simulation by printing the output file containing our final results.

labels = ["n_scattering: ", "n_reaction: ", "n_nondefined: ", "reaction_probability: "]
results =[n_scat, n_reac, n_nondef, prob_reac_all]
writedlm("$(cur_folder)$(surface)_Et$(e_tran)_results_$(traj_start)to$(traj_end).log", zip(labels,results))