Supplemental arguments - User Guide to Protean 3D

The following table shows supplemental arguments when scripting a local Nova application prediction. Only a subset of these arguments is available, depending on the application. The three rightmost columns show whether the argument applies to NF (NovaFold), NFA (NovaFold Antibody) or ND (NovaDock).

Name	Definition	Defaults	NF	NFA	ND
--verbose – v	To enable verbose output.		✔	✔	✔
--welcome	To show a welcome message.		✔	✔	✔
--EC value	To predict enzyme active sites.	allowed: [true, false] default: false	✔
--enhancedSearch value	To perform DNASTAR’s experimental method for enhancing the structural diversity of the normal template set. For templates selected by protein threading, a proprietary process samples alternate structural conformations and replaces a subset of the templates with lower energy conformations. Note: This option typically adds 30-60 minutes to the prediction time but, in some cases, improves the accuracy of the prediction. We recommend running the prediction with and without this search option.	allowed: [true, false] default: false	✔
--GO value	To predict protein function GO terms.	allowed: [true, false] default: false	✔
--homoflag value	To use all templates or exclude homologs for benchmarking.	allowed: [real, benchmark] default: real	✔
--hours value	To set a maximum simulation runtime.	range: [1,200] default: 50	✔
--idcut value	To set a sequence identity cutoff for benchmarking.	range: [0,1] default: 0.3	✔
--LBS value	To predict ligand binding sites.	allowed: [true, false] default: false	✔
--light value	To enable fast mode (override ‘hours’ option to 5).	allowed: [true, false] default: false	✔
--nmodel value	To specify the maximum number of models to create.	range: [1,10] default: 5	✔
--ntemp value	To specify the maximum number of templates to be used from each threader.	range: [1,50] default: 20	✔
--restraint1 value	To provide a text file containing a collection of distance and/or contact restraints (e.g., active sites, zinc fingers, disulfide bonds): * Pairwise distances between two atoms (i and j) * Contact between two residues (i and j) If both Distance and Contact are specified, they are described in different rows in the same restraint file. Value represents the path and filename of the text file containing the distance and/or contact information. A file located outside of the datadir data directory will be copied into the datadir. IMPORTANT – The text file may not contain any lower-case letters. Note: For detailed information and an example, see the text below this table.		✔
--restraint2 value	To provide a text file containing a user-defined template structure and the alignment between that template and the query sequence. Value represents the path and filename of a text file containing the information below: * The pairwise FASTA-formatted sequence alignment between query and template. * The standard PDB format 3D structural coordinates of a single protein chain of the desired template. A file located outside of the datadir data directory will be copied into the datadir. The alignment file may only include one template sequence. Note: For detailed information and an example, see the text below this table.		✔
--restraint3 value	To nominate a specific single-chained PDB structure as a template in the modeling prediction, along with other templates selected by NovaFold. Value represents the PDB and chain desired for the user template in the format [PDB ID]:[CHAIN ID]. * The CHAIN ID is case sensitive. * An underscore (_) may be used to designate the first listed chain in the PDB. * Downloading the designated file from the Protein Data Bank requires Internet connectivity. Example: 7tim:A Since ‘A’ is the first chain, 7tim:_ would work as an alternative expression. For additional information, see the Note below this table.		✔
NovaFold: --restraint4 value NovaFold Antibody: --add ADD	To nominate a local 3D structure (in PDB format) as a template in the modeling prediction, along with other templates selected by NovaFold or NovaFold Antibody. Value and ADD represents the path and filename of a standard PDB format text file. NovaFold requires a single protein chain; NovaFold Antibody can accept multiple chains. For additional information, see the Note below this table.		✔	✔
NovaFold: --temp_excl value NovaFold Antibody: --exclude EXCLUDE	To exclude certain templates from the library (i.e., to prevent these templates from being considered) during structure prediction, where value and EXCLUDE represents the name of the file containing the list of structures to exclude. In both NovaFold and NovaFold Antibody, templates can be excluded solely by name. In addition, NovaFold lets you specify a sequence identity cutoff value, such that all templates with an identity at that threshold or higher are excluded. By default, if no sequence identity cutoff is specified, a value of 100% is used. The tab delimited file listing templates to exclude must have the following format: [PDB ID][CHAIN ID]. Example: 1wor:A A percent sequence identity can be specified at the end, if desired. For example, 1wor:A 70 would specify a 70% sequence identity cutoff. If no number is specified, the percent sequence identity is assumed to be 100%. An asterisk () may be used to designate any chain in the PDB file. For example, 1wor:	default sequence identity cutoff: 100%	✔	✔
--include INCLUDE	To define one or more chains from a PDB structure as the template(s) to use in the modeling prediction. INCLUDE represents a file listing the only templates to use. Each line represents one template using the format [PDB ID][CHAIN ID][HEAVY/LIGHT] * The PDB ID is four characters, and starts with a number, followed by three letters or numbers. * The CHAIN ID is case sensitive and is typically a single character. Examples: 1IGT:A Heavy 1IGT:A Light			✔
--models MODELS	The H3 loop is the generally the hardest region of the antibody structure to predict. NovaFold Antibody offers a template-based approach that uses a machine learning model to choose the best templates for the H3 loop. The --models option lets you specify how many results models to output, each using a unique H3 loop template.	range: [1-10] default: 1	✔
--max-abinitio-h3 MAX_ABINITIO_H3	To specify a cutoff for switching from ab initio "Distance Guided" prediction of the H3 loop to a template-based prediction. Example: The default setting of ‘3’ means loops of length 3 or shorter would use the ab initio methodology, while loops longer than 3 residues would be built with the template-based approach.	range [3-15] default: 3		✔
--min-coverage	The default behavior of NovaFold Antibody (--min-coverage = 0.0) is to select a template framework based on its statistical significance to the query sequence; coverage criteria is not considered. The threader picks the template with the highest (log-likelihood score)/(background log-likelihood) for matching a sequence. If a single domain matches, only that domain will be proposed as a model. The result is that the most significant template may not always cover the entire query. To ensure the selected template exceeds a particular fractional coverage, specify a higher number for the --min-coverage. This compels NovaFold Antibody to locate a template with both variable and constant domains.	range: [0.0-1.0] default: 0.0		✔
--no-orient-refine	By default, NovaFold Antibody optimizes the rigid-body orientation between the light and heavy antibody chains to remove atomic clashes if they were introduced during the modeling process. To skip the optimization step, use the argument --no-orient-refine.			✔

*Notes:

* During a folding prediction, the threader and user template alignments are each ranked. Therefore, user provided templates may not necessarily appear in the NovaFold Report’s ‘top ten’ template list.

Regarding the template-based restraint commands (--restraint2, --restraint3/INCLUDE and --restraint4/ADD):

* Only one template-based restraint parameter can be used in the novafold command string.

* The add option can be used multiple times to introduce multiple templates to the modeling process with the novafold-antibody command. The include option can be used in combination with the add option.

***********************************

Column requirements in the --restraint1 text file:

Distance rows contain the following columns from left to right:

DIST
Res_No.i
Atom_type_i
Res_No_j
Atom_type_j
Distance in Angstroms

Contact rows contain the following columns from left to right (see definitions below):

CONTACT
Res_No.i
Res_No.j

In both cases, UNK can be used in a row to represent an unknown atom.

Column definitions for a --restraint1 text file:

Given two residues that contact one another (‘Residue i’ and ‘Residue j’) or two atoms at a distance from one another (‘Atom i’ and ‘Atom j’):

Res_No_i – Residue sequence number for Residue i.
Atom_type_i – Atom name for contacting atom of Residue i.
Res_No_j – Residue sequence number for Residue j.
Atom_type_j – Atom name for contacting atom of Residue j.

Example text file for --restraint1:

DIST 12 HG21 50 HB1 8.1

DIST 14 HA 57 1HE 6.2
DIST 21 HB2 43 HD11 4.0
DIST 124 CA 84 CA 17.4
DIST 36 UNK 120 CA 17.4
CONTACT 33 6
CONTACT 60 29
CONTACT 37 345
CONTACT 109 42

***********************************

When using --restraint2, note that:

The length of the aligned template residues must be ≥ 25% of the length of the query sequence.

In the coordinate section, the ATOM record indices need to be numbered sequentially, beginning at 1.

Example text file for --restraint2:

The following is a -restraint2 file for mammoth myoglobin (query) against whale myoglobin (target structure). “ATOM” rows 6-1211 have been omitted for space. The format for ATOM records is described on this PDB web page.

>query
MGLSDGEWELVLKTWGKVEADIPGHGLEVFVRLFTGHPETLEKFDKFKHLKTEGEMKASE
DLKKQGVTVLTALGGILKKKGHHQAEIQPLAQSHATKHKIPIKYLEFISDAIIHVLQSKH
PAEFGAD---------------------------
>1MBN:A
-VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKTEAEMKASE
DLKKHGVTVLTALGAILKKKGHHEAELKPLAQSHATKHKIPIKYLEFISEAIIHVLHSRH
PGDFGADAQGAMNKALELFRKDIAAKYKELGYQG
ATOM 1 N VAL A 1 -2.900 17.600 15.500 1.00 0.00 N
ATOM 2 CA VAL A 1 -3.600 16.400 15.300 1.00 0.00 C
ATOM 3 C VAL A 1 -3.000 15.300 16.200 1.00 0.00 C
ATOM 4 O VAL A 1 -3.700 14.700 17.000 1.00 0.00 O
ATOM 5 CB VAL A 1 -3.500 16.000 13.800 1.00 0.00 C
…
ATOM 1212 NE2 GLN A 152 -1.600 24.200 -1.500 1.00 0.00 N
ATOM 1213 N GLY A 153 1.500 24.700 -6.400 1.00 0.00 N
ATOM 1214 CA GLY A 153 1.100 24.000 -7.600 1.00 0.00 C
ATOM 1215 C GLY A 153 0.300 22.700 -7.500 1.00 0.00 C
ATOM 1216 O GLY A 153 -0.900 22.800 -7.100 1.00 0.00 O
TER 1217 GLY A 153

Optional arguments

Stop a prediction in progress

Need more help with this?
Contact DNASTAR