API Reference#
This section provides detailed documentation for all MolR classes and functions.
Core Classes#
Structure#
- class molr.Structure(n_atoms)[source]#
Bases:
object
A NumPy-based molecular structure representation using Structure of Arrays (SoA) design.
This class stores molecular data in separate NumPy arrays for efficient vectorized operations and memory usage. Supports optional annotations with lazy initialization to save memory when not needed.
- Core annotations (always present):
coord: Atomic coordinates (x, y, z) as float64
atom_name: PDB atom names as U4 strings
element: Element symbols as U2 strings
res_name: Residue names as U3 strings
res_id: Residue sequence numbers as int32
chain_id: Chain identifiers as U1 strings
- Optional annotations (lazy initialization):
alt_loc: Alternate location indicators
occupancy: Occupancy values
b_factor: Temperature factors
charge: Formal charges
serial: Atom serial numbers
insertion_code: Residue insertion codes
segment_id: Segment identifiers
- Classification flags (computed on demand):
is_backbone: Boolean array for backbone atoms
is_sidechain: Boolean array for sidechain atoms
is_aromatic: Boolean array for aromatic atoms
is_ligand: Boolean array for ligand atoms
residue_type: Residue type classification
Example
>>> structure = Structure(n_atoms=100) >>> structure.coord = np.random.rand(100, 3) >>> structure.atom_name[:] = "CA" >>> structure.add_annotation("custom_prop", dtype=np.float32, default_value=1.0)
- Parameters:
n_atoms (
int
)
- __init__(n_atoms)[source]#
Initialize Structure with core annotations only.
- Parameters:
n_atoms (
int
) – Number of atoms in the structure- Raises:
ValueError – If n_atoms <= 0
- property coord: ndarray#
Atomic coordinates array (n_atoms, 3).
- Returns:
NumPy array of atomic coordinates
- add_annotation(name, dtype=<class 'numpy.float32'>, default_value=None)[source]#
Add custom annotation to structure.
- Parameters:
- Raises:
ValueError – If annotation name already exists
- Return type:
- property residue_type: ndarray#
String array with residue type classification (PROTEIN/DNA/RNA/LIGAND).
- __getitem__(index)[source]#
Get subset of structure by index.
- Parameters:
index (
Union
[int
,slice
,ndarray
]) – Integer, slice, or boolean/integer array for indexing- Return type:
- Returns:
New Structure containing selected atoms
Example
>>> subset = structure[structure.element == "C"] >>> single_atom = structure[0] >>> chain_a = structure[structure.chain_id == "A"]
- get_masses()[source]#
Get atomic masses for all atoms.
- Return type:
- Returns:
Array of atomic masses in amu
- get_neighbors_within(atom_idx, radius)[source]#
Get atom indices within radius of specified atom.
- Parameters:
- Return type:
- Returns:
Array of neighbor atom indices (excluding query atom)
Example
>>> neighbors = structure.get_neighbors_within(100, 5.0)
- get_atoms_within_sphere(center, radius)[source]#
Get atoms within spherical region.
- Parameters:
- Return type:
- Returns:
Array of atom indices within the sphere
Example
>>> center = np.array([10.0, 15.0, 20.0]) >>> atoms = structure.get_atoms_within_sphere(center, 8.0)
- get_atoms_within_cog_sphere(selection, radius)[source]#
Get atoms within spherical zone centered at center of geometry of selection.
- Parameters:
- Return type:
- Returns:
Array of atom indices within the COG sphere
Example
>>> active_site = structure.select("resname HIS") >>> nearby = structure.get_atoms_within_cog_sphere(active_site, 10.0)
- get_neighbors_for_atoms(atom_indices, radius)[source]#
Get neighbors for multiple atoms at once (batch operation).
- Parameters:
- Return type:
- Returns:
Dictionary mapping atom_idx -> array of neighbor indices
Example
>>> ca_atoms = structure.select("name CA") >>> neighbors = structure.get_neighbors_for_atoms(ca_atoms, 8.0)
- get_closest_atoms(query_point, k=1)[source]#
Get k nearest atoms to a query point.
- Parameters:
- Return type:
- Returns:
Tuple of (distances, atom_indices) for k nearest atoms
Example
>>> center = np.array([0.0, 0.0, 0.0]) >>> distances, indices = structure.get_closest_atoms(center, k=5)
- get_atoms_between_selections(selection1, selection2, max_distance)[source]#
Find atoms from two selections within max_distance of each other.
- Parameters:
- Return type:
- Returns:
Dictionary with ‘selection1_atoms’, ‘selection2_atoms’, ‘distances’
Example
>>> protein = structure.select("protein") >>> ligand = structure.select("resname LIG") >>> contacts = structure.get_atoms_between_selections(protein, ligand, 5.0)
- has_spatial_index()[source]#
Check if spatial index is available.
- Return type:
- Returns:
True if scipy is available and spatial indexing is possible
- get_bonds_to(other_atoms, max_distance=2.0)[source]#
Find potential bonds to other atoms based on distance.
- select(selection_string)[source]#
Select atoms using selection language.
- Parameters:
selection_string (
str
) – Selection expression- Return type:
- Returns:
Boolean array of selected atoms
Examples
>>> mask = structure.select("protein and backbone") >>> mask = structure.select("resname ALA GLY") >>> mask = structure.select("chain A and resid 1:50")
- Raises:
NotImplementedError – For unsupported selection syntax
- classmethod from_pdb(filename)[source]#
Create Structure from PDB file.
- Parameters:
filename (
str
) – Path to PDB file- Return type:
- Returns:
Structure object with all atoms and annotations
- Raises:
ValueError – If PDB file contains multiple models
Example
>>> structure = Structure.from_pdb("example.pdb") >>> print(f"Loaded {structure.n_atoms} atoms")
- classmethod from_mmcif(filename)[source]#
Create Structure from mmCIF file.
- Parameters:
filename (
str
) – Path to mmCIF file- Return type:
- Returns:
Structure object with all atoms and annotations
- Raises:
ValueError – If mmCIF file contains multiple models
Example
>>> structure = Structure.from_mmcif("example.cif") >>> print(f"Loaded {structure.n_atoms} atoms")
- classmethod from_pdb_string(pdb_content)[source]#
Create Structure from PDB content string.
- Parameters:
pdb_content (
str
) – PDB file content as string- Return type:
- Returns:
Structure object with all atoms and annotations
- Raises:
ValueError – If PDB content contains multiple models
Example
>>> pdb_data = "ATOM 1 N ALA A 1 20.154 16.967 22.478 1.00 10.00 N" >>> structure = Structure.from_pdb_string(pdb_data)
- classmethod from_mmcif_string(mmcif_content)[source]#
Create Structure from mmCIF content string.
- Parameters:
mmcif_content (
str
) – mmCIF file content as string- Return type:
- Returns:
Structure object with all atoms and annotations
- Raises:
ValueError – If mmCIF content contains multiple models
Example
>>> mmcif_data = "data_test\nloop_\n_atom_site.group_PDB\n..." >>> structure = Structure.from_mmcif_string(mmcif_data)
- detect_bonds(vdw_factor=0.75, use_file_bonds=True, store_bonds=True)[source]#
Detect bonds using the simplified default detector.
- Parameters:
- Return type:
molr.BondList
- Returns:
BondList with detected bonds
Example
>>> structure = Structure.from_pdb("protein.pdb") >>> bonds = structure.detect_bonds() >>> print(f"Detected {len(bonds)} bonds")
- property bonds: TypeAliasForwardRef('molr.BondList') | None#
Get bonds associated with this structure.
- Returns:
BondList if bonds have been detected/assigned, None otherwise
- has_bonds()[source]#
Check if structure has bond information.
- Return type:
- Returns:
True if bonds are available
The main class for representing molecular structures with spatial indexing capabilities.
Key Methods:
from_pdb()
- Load from PDB filefrom_mmcif()
- Load from mmCIF fileselect()
- Atom selection using query languagedetect_bonds()
- Automatic bond detectionget_neighbors_within()
- Spatial neighbor queries
StructureEnsemble#
- class molr.StructureEnsemble(template, n_frames=0)[source]#
Bases:
object
Ensemble of molecular structures representing trajectory data.
This class stores multiple frames of coordinate data while sharing annotations (atom names, elements, etc.) across all frames for memory efficiency. Designed for trajectory analysis and multi-model PDB files.
- Memory layout:
coords: (n_frames, n_atoms, 3) array of coordinates
Annotations shared from template Structure
Optional time and box information per frame
Example
>>> ensemble = StructureEnsemble.from_structures([struct1, struct2]) >>> print(f"Trajectory with {ensemble.n_frames} frames") >>> frame0 = ensemble[0] # Returns Structure for frame 0
- classmethod from_pdb(filename)[source]#
Create StructureEnsemble from multi-model PDB file.
- Parameters:
filename (
str
) – Path to multi-model PDB file- Return type:
- Returns:
StructureEnsemble with all models as frames
- Raises:
ValueError – If PDB file contains only single model
- classmethod from_pdb_string(pdb_content)[source]#
Create StructureEnsemble from multi-model PDB content string.
- Parameters:
pdb_content (
str
) – PDB content string with multiple models- Return type:
- Returns:
StructureEnsemble with all models as frames
- Raises:
ValueError – If PDB content contains only single model
- classmethod from_mmcif(filename)[source]#
Create StructureEnsemble from multi-model mmCIF file.
- Parameters:
filename (
str
) – Path to multi-model mmCIF file- Return type:
- Returns:
StructureEnsemble with all models as frames
- Raises:
ValueError – If mmCIF file contains only single model
- classmethod from_mmcif_string(mmcif_content)[source]#
Create StructureEnsemble from multi-model mmCIF content string.
- Parameters:
mmcif_content (
str
) – mmCIF content string with multiple models- Return type:
- Returns:
StructureEnsemble with all models as frames
- Raises:
ValueError – If mmCIF content contains only single model
- classmethod from_structures(structures)[source]#
Create StructureEnsemble from list of Structure objects.
- Parameters:
structures (
List
[Structure
]) – List of Structure objects with same atoms- Return type:
- Returns:
StructureEnsemble with structures as frames
- Raises:
ValueError – If structures have different atom counts
- add_frame(structure, time=None)[source]#
Add a new frame to the ensemble.
- Parameters:
- Raises:
ValueError – If structure atom count doesn’t match
- Return type:
- __getitem__(index)[source]#
Get frame(s) from ensemble.
- Parameters:
- Return type:
- Returns:
Structure for single frame, StructureEnsemble for slice
Examples
>>> frame0 = ensemble[0] # Single frame as Structure >>> sub_traj = ensemble[10:20] # Sub-trajectory as StructureEnsemble
Multi-model trajectory representation for handling structural ensembles.
Key Methods:
from_pdb()
- Load multi-model PDB__getitem__()
- Access individual models__len__()
- Number of models
BondList#
- class molr.BondList(n_bonds=0)[source]#
Bases:
object
Efficient storage and manipulation of molecular bonds with smart indexing.
The BondList class stores bonds as pairs of atom indices with additional metadata such as bond order, detection method, and confidence scores. It supports smart indexing that automatically adjusts bond indices when the parent structure is sliced or modified.
- Bond storage uses Structure of Arrays (SoA) design:
bonds: (N, 2) array of atom index pairs
bond_order: Bond order (1=single, 2=double, 3=triple, 1.5=aromatic)
bond_type: Bond type classification
detection_method: How the bond was detected
confidence: Confidence score for bond existence
- Smart indexing features:
Automatic bond index adjustment when structure is sliced
Efficient bond filtering based on atom selections
Bond validation against structure changes
Example
>>> bond_list = BondList() >>> bond_list.add_bond(0, 1, bond_order=1.0, bond_type="covalent") >>> bond_list.add_bonds([(2, 3), (3, 4)], bond_orders=[1.0, 2.0]) >>> subset_bonds = bond_list.filter_by_atoms([0, 1, 2])
- Parameters:
n_bonds (
int
, default:0
)
- __init__(n_bonds=0)[source]#
Initialize BondList.
- Parameters:
n_bonds (
int
, default:0
) – Initial number of bonds (default: 0 for dynamic growth)
- add_property(name, dtype=<class 'numpy.float32'>, default_value=None)[source]#
Add custom property to bonds.
- Parameters:
- Raises:
ValueError – If property name already exists
- Return type:
- add_bond(atom1, atom2, bond_order=1.0, bond_type='covalent', **kwargs)[source]#
Add a single bond.
- Parameters:
- Return type:
- Returns:
Index of the added bond
- Raises:
ValueError – If atoms are the same or invalid
- add_bonds(bond_pairs, bond_orders=None, bond_types=None, **kwargs)[source]#
Add multiple bonds efficiently.
- Parameters:
bond_pairs (
List
[Tuple
[int
,int
]]) – List of (atom1, atom2) tuplesbond_orders (
Optional
[List
[float
]], default:None
) – Optional list of bond orders (default: all 1.0)bond_types (
Optional
[List
[str
]], default:None
) – Optional list of bond types (default: all “covalent”)**kwargs (
Any
) – Additional properties as lists
- Return type:
- Returns:
Array of bond indices for added bonds
- Raises:
ValueError – If list lengths don’t match
- filter_by_atoms(atom_indices)[source]#
Create new BondList containing only bonds between specified atoms.
Efficient storage and manipulation of molecular bonds.
Key Methods:
get_bond()
- Get bond between atomsget_neighbors()
- Get bonded neighborsto_connectivity_matrix()
- Convert to adjacency matrix
Bond Detection#
DefaultBondDetector#
- class molr.bond_detection.DefaultBondDetector(vdw_factor=0.75)[source]#
Bases:
object
Simplified bond detector using templates and distance criteria.
This replaces the complex hierarchical system with a straightforward approach: 1. Apply residue templates (from residue_bonds.py or CCD) 2. Apply distance-based detection as fallback
- Parameters:
vdw_factor (
float
, default:0.75
)
- __init__(vdw_factor=0.75)[source]#
Initialize the default bond detector.
- Parameters:
vdw_factor (
float
, default:0.75
) – Factor for Van der Waals radii in distance detection (0.0 < factor <= 1.0). Default 0.75 works well for most cases.
Default bond detector that combines residue templates and distance-based detection.
Bond Detection Functions#
- molr.bond_detection.detect_bonds(structure, vdw_factor=0.75, use_file_bonds=True)[source]#
Convenience function to detect bonds in a structure.
Main function for bond detection in molecular structures.
I/O Parsers#
PDB Parser#
- class molr.PDBParser[source]#
Bases:
object
PDB file parser for the space module.
Designed specifically for the NumPy-based Structure class, this parser converts pdbreader output directly to NumPy arrays for optimal performance.
Features: - Direct conversion to NumPy arrays - CONECT record parsing for explicit bonds - Multi-model support for trajectories - Efficient memory usage - Full PDB annotation support
- parse_file(filename)[source]#
Parse a PDB file and return a Structure.
- Parameters:
filename (
str
) – Path to the PDB file- Return type:
- Returns:
Structure object with all atoms and annotations
- Raises:
IOError – If file cannot be read
ValueError – If PDB format is invalid
Parser for PDB format files with support for:
Multi-model structures
CONECT record parsing
Alternate conformations
Insertion codes
Crystal information
mmCIF Parser#
- class molr.mmCIFParser[source]#
Bases:
object
mmCIF file parser for the space module.
Designed specifically for the NumPy-based Structure class, this parser converts mmcif output directly to NumPy arrays for optimal performance.
Features: - Direct conversion to NumPy arrays - Multi-model support for trajectories - Efficient memory usage - Full mmCIF annotation support - Chemical bond information from mmCIF data
- parse_file(filename)[source]#
Parse an mmCIF file and return a Structure or StructureEnsemble.
- Parameters:
filename (
str
) – Path to the mmCIF file- Return type:
- Returns:
Structure object for single model, StructureEnsemble for multi-model
- Raises:
IOError – If file cannot be read
ValueError – If mmCIF format is invalid
Parser for mmCIF format files with support for:
Chemical bond information
Large structure handling
Complete metadata extraction
Selection System#
Selection Engine#
- class molr.selection.SelectionEngine(cache_size=100)[source]#
Bases:
object
Engine for evaluating atom selections on structures.
Provides caching and optimization for repeated selections.
- Parameters:
cache_size (
int
, default:100
)
- __init__(cache_size=100)[source]#
Initialize selection engine.
- Parameters:
cache_size (
int
, default:100
) – Maximum number of cached selections
- select(structure, selection)[source]#
Select atoms from a structure.
- Parameters:
structure (
Structure
) – The structure to select fromselection (
Union
[str
,SelectionExpression
]) – Selection string or expression
- Return type:
- Returns:
Boolean array indicating selected atoms
- Raises:
ParseException – If selection string is invalid
- select_atoms(structure, selection)[source]#
Return a new Structure containing only selected atoms.
- Parameters:
structure (
Structure
) – The structure to select fromselection (
Union
[str
,SelectionExpression
]) – Selection string or expression
- Return type:
- Returns:
New Structure with selected atoms
- count(structure, selection)[source]#
Count atoms matching selection.
- Parameters:
structure (
Structure
) – The structure to select fromselection (
Union
[str
,SelectionExpression
]) – Selection string or expression
- Return type:
- Returns:
Number of selected atoms
Main engine for parsing and evaluating selection expressions.
Selection Functions#
- molr.selection.select(structure, selection)[source]#
Select atoms from a structure.
- Parameters:
structure (
Structure
) – The structure to select fromselection (
Union
[str
,SelectionExpression
]) – Selection string or expression
- Return type:
- Returns:
Boolean array indicating selected atoms
Main selection function for atom queries.
- molr.selection.select_atoms(structure, selection)[source]#
Return a new Structure containing only selected atoms.
- Parameters:
structure (
Structure
) – The structure to select fromselection (
Union
[str
,SelectionExpression
]) – Selection string or expression
- Return type:
- Returns:
New Structure with selected atoms
Alternative selection function.
Selection Parser#
- class molr.selection.SelectionParser[source]#
Bases:
object
Parser for atom selection language.
- Supports syntax like:
“protein and backbone”
“resname ALA GLY”
“chain A and resid 1:100”
“element C N O”
“not water”
“(protein and chain A) or ligand”
“byres (ligand and within 5 of protein)”
- parse(selection_string)[source]#
Parse a selection string into a SelectionExpression.
- Parameters:
selection_string (
str
) – The selection string to parse- Return type:
- Returns:
SelectionExpression object
- Raises:
ParseException – If the string cannot be parsed
pyparsing-based parser for selection language syntax.
Supported Expressions:
Atom properties: name, element, resname, chain
Spatial queries: within, around, cog
Boolean operations: and, or, not
Residue modifiers: byres
Predefined groups: protein, backbone, sidechain
Expression Classes#
Base Expression#
- class molr.selection.SelectionExpression[source]#
Bases:
ABC
Abstract base class for all selection expressions.
Selection expressions form a tree structure that can be evaluated against a Structure to produce a boolean mask indicating which atoms are selected.
- __and__(other)[source]#
Create AND expression using & operator.
- Parameters:
other (
SelectionExpression
)- Return type:
- __or__(other)[source]#
Create OR expression using | operator.
- Parameters:
other (
SelectionExpression
)- Return type:
Atom Property Expressions#
- class molr.selection.ElementExpression(elements)[source]#
Bases:
SelectionExpression
Select atoms by element type.
- class molr.selection.AtomNameExpression(names)[source]#
Bases:
SelectionExpression
Select atoms by atom name.
- class molr.selection.ResidueNameExpression(resnames)[source]#
Bases:
SelectionExpression
Select atoms by residue name.
- class molr.selection.ResidueIdExpression(resids)[source]#
Bases:
SelectionExpression
Select atoms by residue ID.
- class molr.selection.ChainExpression(chains)[source]#
Bases:
SelectionExpression
Select atoms by chain ID.
- class molr.selection.IndexExpression(indices)[source]#
Bases:
SelectionExpression
Select atoms by index.
Structural Expressions#
- class molr.selection.BackboneExpression[source]#
Bases:
SelectionExpression
Select backbone atoms.
- class molr.selection.SidechainExpression[source]#
Bases:
SelectionExpression
Select sidechain atoms.
- class molr.selection.ProteinExpression[source]#
Bases:
SelectionExpression
Select protein atoms.
- class molr.selection.NucleicExpression[source]#
Bases:
SelectionExpression
Select nucleic acid atoms.
- class molr.selection.DNAExpression[source]#
Bases:
SelectionExpression
Select DNA atoms.
- class molr.selection.RNAExpression[source]#
Bases:
SelectionExpression
Select RNA atoms.
- class molr.selection.LigandExpression[source]#
Bases:
SelectionExpression
Select ligand atoms.
- class molr.selection.AromaticExpression[source]#
Bases:
SelectionExpression
Select aromatic atoms.
Boolean Expressions#
- class molr.selection.AndExpression(left, right)[source]#
Bases:
SelectionExpression
Logical AND of two expressions.
- Parameters:
left (
SelectionExpression
)right (
SelectionExpression
)
- __init__(left, right)[source]#
Initialize AND expression.
- Parameters:
left (
SelectionExpression
) – Left operandright (
SelectionExpression
) – Right operand
- class molr.selection.OrExpression(left, right)[source]#
Bases:
SelectionExpression
Logical OR of two expressions.
- Parameters:
left (
SelectionExpression
)right (
SelectionExpression
)
- __init__(left, right)[source]#
Initialize OR expression.
- Parameters:
left (
SelectionExpression
) – Left operandright (
SelectionExpression
) – Right operand
- class molr.selection.NotExpression(operand)[source]#
Bases:
SelectionExpression
Logical NOT of an expression.
- Parameters:
operand (
SelectionExpression
)
- __init__(operand)[source]#
Initialize NOT expression.
- Parameters:
operand (
SelectionExpression
) – Expression to negate
Special Expressions#
- class molr.selection.AllExpression[source]#
Bases:
SelectionExpression
Select all atoms.
- class molr.selection.NoneExpression[source]#
Bases:
SelectionExpression
Select no atoms.
- class molr.selection.ByResidueExpression(atom_selection)[source]#
Bases:
SelectionExpression
Select complete residues based on atom selection.
- Parameters:
atom_selection (
SelectionExpression
)
- __init__(atom_selection)[source]#
Initialize by-residue selection.
- Parameters:
atom_selection (
SelectionExpression
) – Expression to identify residues
Utilities#
Atom Utilities#
Atom Utilities
This module contains utility functions for working with PDB atoms and elements.
- molr.utilities.atom_utils.get_element_from_pdb_atom(atom_name)[source]#
Map PDB atom name to chemical element using regex patterns.
This function uses regular expressions to identify the element type from PDB atom naming conventions, handling complex cases like: - Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH) - Numbered variants (C1’, H2’’, OP1, etc.) - Ion charges (CA2+, MG2+, etc.) - IUPAC hydrogen naming conventions
- Parameters:
atom_name (str) – PDB atom name (e.g., ‘CA’, ‘OP1’, ‘H2’’, ‘CA2+’)
- Returns:
Chemical element symbol (e.g., ‘C’, ‘O’, ‘H’, ‘CA’)
- Return type:
Examples
>>> get_element_from_pdb_atom('CA') 'C' >>> get_element_from_pdb_atom('OP1') 'O' >>> get_element_from_pdb_atom('CA2+') 'CA' >>> get_element_from_pdb_atom('H2'') 'H'
- molr.utilities.atom_utils.pdb_atom_to_element(atom_name)[source]#
High-performance mapping of PDB atom name to chemical element.
Uses a pre-computed dictionary for common atoms and falls back to regex-based pattern matching for less common cases.
Key Functions:
classify_atom_types()
- Classify backbone/sidechaincalculate_center_of_geometry()
- COG calculationget_vdw_radius()
- Van der Waals radii lookup
Constants#
Atomic Data#
Atomic Data Constants
This module contains atomic properties and constants for all elements commonly found in protein, DNA, RNA, and water molecules in PDB structures.
- class molr.constants.atomic_data.AtomicData[source]#
Atomic properties and constants.
This class contains atomic data for all elements commonly found in protein, DNA, RNA, and water molecules in PDB structures.
- COVALENT_RADII = {'BR': 1.14, 'C': 0.76, 'CA': 1.76, 'CL': 0.99, 'CO': 1.26, 'CU': 1.32, 'D': 0.31, 'F': 0.57, 'FE': 1.32, 'H': 0.31, 'I': 1.33, 'K': 2.03, 'MG': 1.41, 'MN': 1.39, 'N': 0.71, 'NA': 1.66, 'NI': 1.24, 'O': 0.66, 'P': 1.07, 'S': 1.05, 'ZN': 1.22}#
- VDW_RADII = {'AL': 1.84, 'AR': 1.88, 'AS': 1.85, 'AT': 2.02, 'AU': 2.1, 'B': 1.92, 'BA': 2.68, 'BE': 1.53, 'BI': 2.07, 'BR': 1.83, 'C': 1.7, 'CA': 2.31, 'CL': 1.75, 'CO': 2.0, 'CS': 3.43, 'CU': 2.0, 'F': 1.47, 'FE': 2.05, 'FR': 3.48, 'GA': 1.87, 'GE': 2.11, 'H': 1.1, 'HE': 1.4, 'I': 1.98, 'IN': 1.93, 'K': 2.75, 'KR': 2.02, 'LI': 1.81, 'MG': 1.73, 'MN': 2.05, 'MO': 2.1, 'N': 1.55, 'NA': 2.27, 'NE': 1.54, 'NI': 2.0, 'O': 1.52, 'P': 1.8, 'PB': 2.02, 'PO': 1.97, 'PT': 2.05, 'RA': 2.83, 'RB': 3.03, 'RN': 2.2, 'RU': 2.05, 'S': 1.8, 'SB': 2.06, 'SE': 1.9, 'SI': 2.1, 'SN': 2.17, 'SR': 2.49, 'TE': 2.06, 'TL': 1.96, 'W': 2.1, 'XE': 2.16, 'ZN': 2.1}#
- ELECTRONEGATIVITY = {'BR': 2.96, 'C': 2.55, 'CA': 1.0, 'CL': 3.16, 'CO': 1.88, 'CU': 1.9, 'D': 2.2, 'F': 3.98, 'FE': 1.83, 'H': 2.2, 'I': 2.66, 'K': 0.82, 'MG': 1.31, 'MN': 1.55, 'N': 3.04, 'NA': 0.93, 'NI': 1.91, 'O': 3.44, 'P': 2.19, 'S': 2.58, 'ZN': 1.65}#
- ATOMIC_MASSES = {'BR': 79.904, 'C': 12.011, 'CA': 40.078, 'CL': 35.453, 'CO': 58.933, 'CU': 63.546, 'D': 2.014, 'F': 18.998, 'FE': 55.845, 'H': 1.008, 'I': 126.904, 'K': 39.098, 'MG': 24.305, 'MN': 54.938, 'N': 14.007, 'NA': 22.99, 'NI': 58.693, 'O': 15.999, 'P': 30.974, 'S': 32.065, 'ZN': 65.38}#
- DEFAULT_ATOMIC_MASS = 12.011#
- MIN_HYDROGEN_RATIO = 0.25#
- METAL_ELEMENTS = {'CA', 'CO', 'CU', 'FE', 'K', 'MG', 'MN', 'NA', 'NI', 'ZN'}#
Standard atomic properties and constants.
Available Data:
Element symbols and atomic numbers
Van der Waals radii
Covalent radii
Atomic masses
Bond Parameters#
Bond detection parameters and constants.
This module provides constants for bond detection algorithms including distance thresholds, quality assessment parameters, and validation rules.
Bond length and angle parameters for different atom types.
Available Data:
Standard bond lengths
Bond angle preferences
Distance cutoffs for bond detection
PDB Constants#
PDB Structure Constants
This module contains constants specifically related to PDB file processing, including residue mappings, atom classifications, and molecular recognition patterns used throughout MolR’s structure analysis components.
- molr.constants.pdb_constants.PROTEIN_SUBSTITUTIONS: Dict[str, str] = {'2AS': 'ASP', '3AH': 'HIS', '5HP': 'GLU', '5OW': 'LYS', 'ACL': 'ARG', 'AGM': 'ARG', 'AIB': 'ALA', 'ALM': 'ALA', 'ALO': 'THR', 'ALY': 'LYS', 'ARM': 'ARG', 'ASA': 'ASP', 'ASB': 'ASP', 'ASK': 'ASP', 'ASL': 'ASP', 'ASQ': 'ASP', 'AYA': 'ALA', 'BCS': 'CYS', 'BHD': 'ASP', 'BMT': 'THR', 'BNN': 'ALA', 'BUC': 'CYS', 'BUG': 'LEU', 'C5C': 'CYS', 'C6C': 'CYS', 'CAS': 'CYS', 'CCS': 'CYS', 'CEA': 'CYS', 'CGU': 'GLU', 'CHG': 'ALA', 'CLE': 'LEU', 'CME': 'CYS', 'CSD': 'ALA', 'CSO': 'CYS', 'CSP': 'CYS', 'CSS': 'CYS', 'CSW': 'CYS', 'CSX': 'CYS', 'CXM': 'MET', 'CY1': 'CYS', 'CY3': 'CYS', 'CYG': 'CYS', 'CYM': 'CYS', 'CYQ': 'CYS', 'DAH': 'PHE', 'DAL': 'ALA', 'DAR': 'ARG', 'DAS': 'ASP', 'DCY': 'CYS', 'DGL': 'GLU', 'DGN': 'GLN', 'DHA': 'ALA', 'DHI': 'HIS', 'DIL': 'ILE', 'DIV': 'VAL', 'DLE': 'LEU', 'DLY': 'LYS', 'DNP': 'ALA', 'DPN': 'PHE', 'DPR': 'PRO', 'DSN': 'SER', 'DSP': 'ASP', 'DTH': 'THR', 'DTR': 'TRP', 'DTY': 'TYR', 'DVA': 'VAL', 'EFC': 'CYS', 'FLA': 'ALA', 'FME': 'MET', 'GGL': 'GLU', 'GL3': 'GLY', 'GLZ': 'GLY', 'GMA': 'GLU', 'GSC': 'GLY', 'HAC': 'ALA', 'HAR': 'ARG', 'HIC': 'HIS', 'HIP': 'HIS', 'HMR': 'ARG', 'HPQ': 'PHE', 'HTR': 'TRP', 'HYP': 'PRO', 'IAS': 'ASP', 'IIL': 'ILE', 'IYR': 'TYR', 'KCX': 'LYS', 'LLP': 'LYS', 'LLY': 'LYS', 'LTR': 'TRP', 'LYM': 'LYS', 'LYZ': 'LYS', 'MAA': 'ALA', 'MEN': 'ASN', 'MHS': 'HIS', 'MIS': 'SER', 'MK8': 'LEU', 'MLE': 'LEU', 'MPQ': 'GLY', 'MSA': 'GLY', 'MSE': 'MET', 'MVA': 'VAL', 'NEM': 'HIS', 'NEP': 'HIS', 'NLE': 'LEU', 'NLN': 'LEU', 'NLP': 'LEU', 'NMC': 'GLY', 'OAS': 'SER', 'OCS': 'CYS', 'OMT': 'MET', 'PAQ': 'TYR', 'PCA': 'GLU', 'PEC': 'CYS', 'PHI': 'PHE', 'PHL': 'PHE', 'PR3': 'CYS', 'PRR': 'ALA', 'PTR': 'TYR', 'PYX': 'CYS', 'SAC': 'SER', 'SAR': 'GLY', 'SCH': 'CYS', 'SCS': 'CYS', 'SCY': 'CYS', 'SEL': 'SER', 'SEP': 'SER', 'SET': 'SER', 'SHC': 'CYS', 'SHR': 'LYS', 'SMC': 'CYS', 'SOC': 'CYS', 'STY': 'TYR', 'SVA': 'SER', 'TIH': 'ALA', 'TPL': 'TRP', 'TPO': 'THR', 'TPQ': 'ALA', 'TRG': 'LYS', 'TRO': 'TRP', 'TYB': 'TYR', 'TYI': 'TYR', 'TYQ': 'TYR', 'TYS': 'TYR', 'TYY': 'TYR'}#
Mapping of non-standard protein residue codes to their standard amino acid equivalents.
This comprehensive dictionary provides substitutions for modified, methylated, phosphorylated, and other chemically altered amino acid residues commonly found in PDB structures. Used by PDB fixing operations to standardize protein residue names for consistent analysis.
Examples
MSE (selenomethionine) → MET (methionine)
CSO (cysteine sulfenic acid) → CYS (cysteine)
HYP (hydroxyproline) → PRO (proline)
PCA (pyroglutamic acid) → GLU (glutamic acid)
Note: This dictionary contains only protein residue substitutions. Nucleotide modifications are handled separately.
- molr.constants.pdb_constants.PROTEIN_RESIDUES: List[str] = ['ALA', 'ASN', 'CYS', 'GLU', 'HIS', 'LEU', 'MET', 'PRO', 'THR', 'TYR', 'ARG', 'ASP', 'GLN', 'GLY', 'ILE', 'LYS', 'PHE', 'SER', 'TRP', 'VAL']#
Standard three-letter codes for the 20 canonical amino acid residues.
This list contains all naturally occurring protein amino acids in their standard three-letter abbreviation format as used in PDB files. Used for residue type validation, protein chain identification, and analysis scope determination.
- The 20 amino acids are:
Alanine (ALA), Arginine (ARG), Asparagine (ASN), Aspartic acid (ASP)
Cysteine (CYS), Glutamic acid (GLU), Glutamine (GLN), Glycine (GLY)
Histidine (HIS), Isoleucine (ILE), Leucine (LEU), Lysine (LYS)
Methionine (MET), Phenylalanine (PHE), Proline (PRO), Serine (SER)
Threonine (THR), Tryptophan (TRP), Tyrosine (TYR), Valine (VAL)
- Type:
List[str]
- molr.constants.pdb_constants.RNA_RESIDUES: List[str] = ['A', 'G', 'C', 'U', 'I']#
Standard single-letter codes for RNA nucleotide residues.
- Contains the five RNA nucleotides commonly found in PDB structures:
A (Adenine): Purine base forming A-U base pairs
G (Guanine): Purine base forming G-C base pairs
C (Cytosine): Pyrimidine base forming C-G base pairs
U (Uracil): Pyrimidine base forming U-A base pairs
I (Inosine): Modified nucleotide, wobble base pairing
Used for nucleic acid chain identification and RNA structure analysis.
- Type:
List[str]
- molr.constants.pdb_constants.DNA_RESIDUES: List[str] = ['DA', 'DG', 'DC', 'DT', 'DI']#
Standard two-letter codes for DNA nucleotide residues.
- Contains the five DNA nucleotides commonly found in PDB structures:
DA (Deoxyadenosine): Purine base forming A-T base pairs
DG (Deoxyguanosine): Purine base forming G-C base pairs
DC (Deoxycytidine): Pyrimidine base forming C-G base pairs
DT (Deoxythymidine): Pyrimidine base forming T-A base pairs
DI (Deoxyinosine): Modified nucleotide, wobble base pairing
Used for nucleic acid chain identification and DNA structure analysis. The ‘D’ prefix distinguishes DNA nucleotides from RNA nucleotides.
- Type:
List[str]
- molr.constants.pdb_constants.PDB_ATOM_TO_ELEMENT: Dict[str, str] = {'BR': 'BR', 'C': 'C', "C1'": 'C', 'C2': 'C', "C2'": 'C', "C3'": 'C', 'C4': 'C', "C4'": 'C', 'C5': 'C', "C5'": 'C', 'C5M': 'C', 'C6': 'C', 'C8': 'C', 'CA': 'C', 'CB': 'C', 'CD': 'C', 'CE': 'C', 'CG': 'C', 'CL': 'CL', 'CZ': 'C', 'D': 'D', 'F': 'F', 'H': 'H', 'HA': 'H', 'HB': 'H', 'HD': 'H', 'HE': 'H', 'HG': 'H', 'HH': 'H', 'HN': 'H', 'HO': 'H', 'HOH': 'H', 'HS': 'H', 'HZ': 'H', 'I': 'I', 'N': 'N', 'N1': 'N', 'N2': 'N', 'N3': 'N', 'N4': 'N', 'N6': 'N', 'N7': 'N', 'N9': 'N', 'ND1': 'N', 'ND2': 'N', 'NE': 'N', 'NE1': 'N', 'NE2': 'N', 'NH1': 'N', 'NH2': 'N', 'NZ': 'N', 'O': 'O', 'O2': 'O', "O2'": 'O', "O3'": 'O', 'O4': 'O', "O4'": 'O', "O5'": 'O', 'O6': 'O', 'OD1': 'O', 'OD2': 'O', 'OE1': 'O', 'OE2': 'O', 'OG': 'O', 'OG1': 'O', 'OH': 'O', 'OH2': 'O', 'OP1': 'O', 'OP2': 'O', 'P': 'P', 'SD': 'S', 'SG': 'S'}#
Pre-computed mapping of common PDB atom names to their element types.
This dictionary provides fast lookup for the most frequently encountered PDB atoms. For comprehensive coverage including unusual atoms, use pdb_atom_to_element() function which uses regex-based pattern matching.
- Coverage includes:
Protein backbone and common side chain atoms
DNA/RNA backbone and nucleotide base atoms
Standard hydrogen atoms
Water molecules
- For full pattern-based mapping that handles:
Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH)
Numbered variants (C1’, H2’’, OP1, etc.)
Ion charges (CA2+, MG2+, etc.)
IUPAC hydrogen naming conventions
Uncommon PDB atom names
Use pdb_atom_to_element() function instead.
- Used for:
Looking up atomic properties (radius, mass, electronegativity)
Covalent bond detection
Van der Waals calculations
Molecular mass calculations
- molr.constants.pdb_constants.PROTEIN_BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O']#
Standard protein backbone atom names in PDB format.
- Defines the four atoms that form the protein backbone (main chain):
N: Amino nitrogen atom
CA: Alpha carbon atom (central carbon)
C: Carbonyl carbon atom
O: Carbonyl oxygen atom
These atoms are present in all amino acid residues (except proline’s modified N) and form the peptide bonds that connect residues.
- Type:
List[str]
- molr.constants.pdb_constants.DNA_RNA_BACKBONE_ATOMS: List[str] = ['P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#
Standard DNA/RNA backbone atom names in PDB format.
- Sugar-phosphate backbone atoms:
P: Phosphorus atom
OP1, OP2: Non-bridging phosphate oxygens
O5’: 5’ phosphate oxygen (bridging)
C5’: 5’ carbon of ribose/deoxyribose
C4’: 4’ carbon of ribose/deoxyribose
O4’: 4’ oxygen of ribose/deoxyribose (ring oxygen)
C3’: 3’ carbon of ribose/deoxyribose
O3’: 3’ phosphate oxygen (bridging)
C2’: 2’ carbon of ribose/deoxyribose
O2’: 2’ hydroxyl oxygen (RNA only, absent in DNA)
C1’: 1’ carbon of ribose/deoxyribose (anomeric carbon)
Note: O2’ is present in RNA but absent in DNA (deoxyribose lacks 2’ hydroxyl).
- Type:
List[str]
- molr.constants.pdb_constants.BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O', 'P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#
Combined backbone atom names for proteins, DNA, and RNA in PDB format.
This list is the combination of PROTEIN_BACKBONE_ATOMS and DNA_RNA_BACKBONE_ATOMS, providing a comprehensive set of backbone atoms for all major biomolecule types.
- Used for:
Backbone hydrogen bond identification across all molecule types
Secondary structure analysis
Main chain vs side chain/base classification
Nucleic acid backbone conformation analysis
- Type:
List[str]
- molr.constants.pdb_constants.PROTEIN_SIDECHAIN_ATOMS: List[str] = ['CB', 'CG', 'CD', 'NE', 'CZ', 'NH1', 'NH2', 'OD1', 'ND2', 'OD2', 'SG', 'OE1', 'NE2', 'OE2', 'CD2', 'ND1', 'CE1', 'CG1', 'CG2', 'CD1', 'CE', 'NZ', 'SD', 'CE2', 'OG', 'OG1', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OH']#
Common protein side chain atom names in PDB format.
- Comprehensive list of side chain (R-group) atoms found in the 20 standard amino acids:
Aliphatic carbons: CB, CG, CD, CE, CZ (branching from CA)
Aromatic carbons: CD1/CD2, CE1/CE2/CE3, CZ2/CZ3, CH2 (ring systems)
Nitrogen atoms: NE, NH1, NH2, ND1, ND2, NE1, NE2, NZ (basic groups)
Oxygen atoms: OD1, OD2, OE1, OE2, OG, OG1, OH (acidic/hydroxyl groups)
Sulfur atoms: SG, SD (cysteine, methionine)
- Used for:
Side chain interaction analysis
Functional group identification
Hydrogen bond donor/acceptor classification
- Type:
List[str]
- molr.constants.pdb_constants.DNA_RNA_BASE_ATOMS: List[str] = ['N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#
Common DNA/RNA base atom names in PDB format.
Base atoms found in nucleotides: Purine bases (Adenine, Guanine):
N1, C2, N3, C4, C5, C6: Six-membered ring atoms
N7, C8, N9: Five-membered ring atoms
N6: Amino group on adenine
O6, N2: Functional groups on guanine
- Pyrimidine bases (Cytosine, Thymine, Uracil):
N1, C2, N3, C4, C5, C6: Six-membered ring atoms
O2: Carbonyl oxygen at position 2
N4: Amino group on cytosine
O4: Carbonyl oxygen at position 4 (thymine/uracil)
C5M: Methyl group on thymine (also called C7)
- Used for:
Base-base interactions (hydrogen bonding, stacking)
Protein-nucleic acid recognition
Base functional group identification
- Type:
List[str]
- molr.constants.pdb_constants.SIDECHAIN_ATOMS: List[str] = ['CB', 'CG', 'CD', 'NE', 'CZ', 'NH1', 'NH2', 'OD1', 'ND2', 'OD2', 'SG', 'OE1', 'NE2', 'OE2', 'CD2', 'ND1', 'CE1', 'CG1', 'CG2', 'CD1', 'CE', 'NZ', 'SD', 'CE2', 'OG', 'OG1', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OH', 'N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#
Combined side chain and base atoms for proteins and nucleic acids.
This list is the combination of PROTEIN_SIDECHAIN_ATOMS and DNA_RNA_BASE_ATOMS, providing a comprehensive set of non-backbone atoms for all major biomolecule types.
- Used for:
Side chain/base interaction analysis
Distinguishing backbone from functional groups
Molecular recognition studies
- Type:
List[str]
- molr.constants.pdb_constants.WATER_MOLECULES: List[str] = ['HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#
Standard water molecule residue names in PDB files.
- Recognition patterns for different water representations:
HOH: Standard PDB water molecule designation
WAT: Alternative water molecule name
DOD: Deuterated water (heavy water)
TIP3: TIP3P water model (3-point)
TIP4: TIP4P water model (4-point)
TIP5: TIP5P water model (5-point)
W: Abbreviated water designation
- Used for:
Water molecule identification in PDB structures
Solvent exclusion during analysis
Water-mediated interaction detection
Hydration shell analysis
- Type:
List[str]
- molr.constants.pdb_constants.RESIDUES: List[str] = ['ALA', 'ASN', 'CYS', 'GLU', 'HIS', 'LEU', 'MET', 'PRO', 'THR', 'TYR', 'ARG', 'ASP', 'GLN', 'GLY', 'ILE', 'LYS', 'PHE', 'SER', 'TRP', 'VAL', 'DA', 'DG', 'DC', 'DT', 'DI', 'A', 'G', 'C', 'U', 'I', 'HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#
Combined list of all standard residue codes for proteins, DNA, and RNA.
This list is the combination of PROTEIN_RESIDUES, DNA_RESIDUES, WATER_MOLECULES, and RNA_RESIDUES, providing a comprehensive set of standard residues found in biomolecular structures.
- Used for:
General residue type validation
Distinguishing standard residues from heterogens
Biomolecule type identification
- Type:
List[str]
- molr.constants.pdb_constants.RESIDUES_WITH_AROMATIC_RINGS: List[str] = ['PHE', 'TYR', 'TRP', 'HIS', 'HID', 'HIE', 'HIP', 'TYI', 'TYQ', 'TYB', 'DA', 'DG', 'DC', 'DT', 'A', 'G', 'C', 'U']#
Residues containing aromatic rings in their structures. This list includes: Protein residues:
PHE: Phenylalanine (benzene ring)
TYR: Tyrosine (phenolic ring)
TRP: Tryptophan (indole ring)
HIS: Histidine (imidazole ring)
HID, HIE, HIP: Different protonation states of histidine
TYI, TYQ, TYB: Variants of tyrosine with modifications
- DNA nucleotides:
DA: Deoxyadenosine (purine ring: adenine)
DG: Deoxyguanosine (purine ring: guanine)
DC: Deoxycytidine (pyrimidine ring: cytosine)
DT: Deoxythymidine (pyrimidine ring: thymine)
- RNA nucleotides:
A: Adenine (purine ring)
G: Guanine (purine ring)
C: Cytosine (pyrimidine ring)
U: Uracil (pyrimidine ring)
- Used for:
Aromatic interaction analysis
π-π stacking detection between proteins and nucleic acids
DNA/RNA-protein interface studies
- Type:
List[str]
- molr.constants.pdb_constants.HYDROGEN_ELEMENTS: List[str] = ['H', 'D']#
Hydrogen element types including isotopes.
Contains the hydrogen element symbols commonly found in PDB structures: - H: Standard hydrogen (protium) - D: Deuterium (heavy hydrogen isotope)
- Used for:
Hydrogen bond donor/acceptor detection
Identifying hydrogen atoms in molecular interactions
Mass calculations and isotope effects
NMR-related structural analysis
- Type:
List[str]
- molr.constants.pdb_constants.HALOGEN_ELEMENTS: List[str] = ['F', 'CL', 'BR', 'I']#
Elements that can participate in halogen bonding as donors.
These halogens can act as electron acceptors in halogen bonds when covalently bonded to carbon (C-X…Y geometry). The halogen forms a σ-hole that can interact with electron-rich regions on acceptor atoms.
F: Fluorine (weakest halogen bond donor due to high electronegativity)
CL: Chlorine (common in drug design, moderate halogen bonding)
BR: Bromine (strong halogen bond donor, commonly studied)
I: Iodine (strongest halogen bond donor due to large, polarizable electron cloud)
- Type:
List[str]
- molr.constants.pdb_constants.HYDROGEN_BOND_DONOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F']#
Elements that can act as hydrogen bond donors.
These elements can form hydrogen bonds when covalently bonded to hydrogen atoms (D-H…A geometry). They are electronegative enough to polarize the D-H bond, creating a partial positive charge on the hydrogen that can interact with electron-rich acceptor atoms.
N: Nitrogen (amino groups, ring nitrogens, strong donors)
O: Oxygen (hydroxyl groups, moderate to strong donors)
S: Sulfur (thiol groups, weak donors due to lower electronegativity)
- Type:
List[str]
- molr.constants.pdb_constants.HYDROGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F', 'CL']#
Elements that can act as hydrogen bond acceptors.
These electronegative elements have lone pairs of electrons that can accept hydrogen bonds from donor atoms (D-H…A geometry). They can form favorable electrostatic interactions with the partial positive charge on hydrogen.
N: Nitrogen (lone pairs on amino groups, ring nitrogens)
O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups - strongest acceptors)
S: Sulfur (lone pairs on thiol, sulfide groups - weaker acceptors)
F: Fluorine (strongest electronegativity, excellent acceptor but rare in proteins)
CL: Chlorine (moderate acceptor, sometimes found in modified residues)
- Type:
List[str]
- molr.constants.pdb_constants.HALOGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S']#
Elements that can act as halogen bond acceptors.
These electronegative atoms can donate electron density to the σ-hole of halogen atoms in halogen bonds. They typically have lone pairs of electrons that can interact with the positive electrostatic potential of the halogen.
N: Nitrogen (lone pairs on amino groups, ring nitrogens)
O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups)
S: Sulfur (lone pairs on thiol, sulfide groups, weaker than N/O)
- Type:
List[str]
- molr.constants.pdb_constants.PI_INTERACTION_DONOR: List[str] = ['C']#
Elements that can act as π-interaction donors.
These atoms can participate in π-interactions when part of π-systems. Currently includes: - C: Carbon atoms
- Type:
List[str]
- molr.constants.pdb_constants.PI_INTERACTION_ATOMS: List[str] = ['H', 'F', 'CL']#
Elements that can participate in π-interactions.
- Type:
List[str]
- molr.constants.pdb_constants.RING_ATOMS_FOR_RESIDUES_WITH_AROMATIC_RINGS: Dict[str, List[str]] = {'A': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'C': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'DA': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'DC': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'DG': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'DT': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'G': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'HID': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIE': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIP': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIS': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'PHE': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TRP': ['CG', 'CD1', 'CD2', 'NE1', 'CE2', 'CE3', 'CZ2', 'CZ3', 'CH2'], 'TYB': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYI': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYQ': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYR': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'U': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6']}#
Mapping of aromatic residues to their ring atom names.
This dictionary provides the specific atom names that form aromatic ring systems for each residue type containing aromatic groups:
Protein residues: Phenylalanine (PHE) and variants:
6-membered benzene ring: CG-CD1-CE1-CZ-CE2-CD2
- Tyrosine (TYR, TYI, TYQ, TYB) and variants:
6-membered phenolic ring: CG-CD1-CE1-CZ-CE2-CD2
TYI: Ionized tyrosine (deprotonated hydroxyl)
TYQ: Quinone form of tyrosine
TYB: Brominated tyrosine
- Tryptophan (TRP):
5-membered pyrrole ring: CG-CD1-NE1-CE2-CD2
6-membered benzene ring: CD2-CE2-CZ2-CH2-CZ3-CE3
Forms bicyclic indole system
- Histidine (HIS, HID, HIE, HIP):
5-membered imidazole ring: CG-ND1-CE1-NE2-CD2
HID: Delta protonated (H on ND1)
HIE: Epsilon protonated (H on NE2)
HIP: Both nitrogens protonated (positive charge)
DNA nucleotides: Adenine (DA) and Guanine (DG) - Purine bases:
5-membered ring: N9-C8-N7-C5-C4
6-membered ring: C5-C6-N1-C2-N3-C4
Forms bicyclic purine system
- Cytosine (DC) and Thymine (DT) - Pyrimidine bases:
6-membered ring: N1-C2-N3-C4-C5-C6
RNA nucleotides: Adenine (A) and Guanine (G) - Purine bases:
Same purine ring system as DNA counterparts
- Cytosine (C) and Uracil (U) - Pyrimidine bases:
Same pyrimidine ring system as DNA counterparts
- Used for:
Calculating aromatic ring centroids for π interactions
Identifying atoms involved in π-π stacking
Determining ring plane orientations
X-H…π interaction analysis where these atoms form the π system
DNA/RNA-protein interface interactions
Nucleotide base stacking analysis
- molr.constants.pdb_constants.HYDROPHOBIC_RESIDUES: List[str] = ['VAL', 'LEU', 'ILE', 'MET', 'PHE', 'TRP', 'PRO', 'ALA']#
Hydrophobic amino acid residues with nonpolar side chains.
- These amino acids have side chains that are predominantly nonpolar and hydrophobic:
VAL (Valine): Branched aliphatic chain
LEU (Leucine): Branched aliphatic chain
ILE (Isoleucine): Branched aliphatic chain
MET (Methionine): Sulfur-containing nonpolar chain
PHE (Phenylalanine): Aromatic benzyl group
TRP (Tryptophan): Aromatic indole group
PRO (Proline): Cyclic imino acid structure
ALA (Alanine): Simple methyl group
- Used for:
Hydrophobic interaction analysis
Protein folding studies
Membrane protein analysis
Hydrophobic patch identification
- Type:
List[str]
- molr.constants.pdb_constants.CHARGED_RESIDUES: List[str] = ['ARG', 'LYS', 'ASP', 'GLU', 'HIS']#
Charged amino acid residues with ionizable side chains.
- These amino acids carry formal charges at physiological pH:
ARG (Arginine): Positively charged guanidinium group (+1)
LYS (Lysine): Positively charged amino group (+1)
ASP (Aspartic acid): Negatively charged carboxylate group (-1)
GLU (Glutamic acid): Negatively charged carboxylate group (-1)
HIS (Histidine): Can be positively charged imidazolium group (pKa ~6)
- Used for:
Electrostatic interaction analysis
Salt bridge identification
pH-dependent behavior studies
Ion binding site analysis
- Type:
List[str]
- molr.constants.pdb_constants.RESIDUE_TYPES: List[str] = ['DNA', 'RNA', 'PROTEIN', 'LIGAND']#
Standard residue type classifications for molecular analysis.
- Classification categories for different types of molecular residues:
DNA: Deoxyribonucleotide residues (DA, DG, DC, DT, DI)
RNA: Ribonucleotide residues (A, G, C, U, I)
PROTEIN: Amino acid residues (20 standard amino acids and variants)
LIGAND: Ligands, cofactors, metals, and other heteroatom residues
- Used for:
Residue type identification and classification
Molecular component analysis
Structure validation and processing
Interaction type determination
- Type:
List[str]
- molr.constants.pdb_constants.RESIDUE_TYPE_CODES: Dict[str, str] = {'DNA': 'D', 'LIGAND': 'L', 'PROTEIN': 'P', 'RNA': 'R'}#
Single letter codes for residue types.
- Mapping of full residue type names to compact single letter codes:
“DNA” → “D”: Deoxyribonucleotide residues
“RNA” → “R”: Ribonucleotide residues
“PROTEIN” → “P”: Amino acid residues
“LIGAND” → “L”: Ligands, cofactors, metals, and other heteroatom residues
Used for compact representation in hydrogen bond descriptions and atom records.
- molr.constants.pdb_constants.BACKBONE_SIDECHAIN_CODES: Dict[str, str] = {'BACKBONE': 'B', 'NOT_APPLICABLE': 'N', 'SIDECHAIN': 'S'}#
Single letter codes for backbone vs sidechain classification.
- Mapping of atom structural classification to compact single letter codes:
“BACKBONE” → “B”: Main chain atoms (protein backbone, DNA/RNA sugar-phosphate)
“SIDECHAIN” → “S”: Side chain atoms (protein R-groups, nucleotide bases)
Used for describing hydrogen bond donor-acceptor relationships (e.g., S-S, S-B, B-B).
- molr.constants.pdb_constants.AROMATIC_CODES: Dict[str, str] = {'AROMATIC': 'A', 'NON-AROMATIC': 'N'}#
Single letter codes for aromatic classification.
- Mapping of aromatic property classification to compact single letter codes:
“AROMATIC” → “A”: Atoms that are part of aromatic ring systems
“NON-AROMATIC” → “N”: Atoms that are not part of aromatic ring systems
Used for identifying atoms involved in π-interactions and aromatic stacking.
PDB format constants and mappings.
Available Data:
Record type definitions
Standard residue names
Chain identifier mappings
Residue Bond Templates#
Residue Bond Information Constants
This module contains bond connectivity information for standard residues extracted from the Chemical Component Dictionary (CCD). This data is used for molecular structure validation and bond detection in molr.
Generated automatically from CCD BinaryCIF files.
- molr.constants.residue_bonds.get_residue_bonds(residue)[source]#
Get bond information for a specific residue.
- molr.constants.residue_bonds.get_residue_bond_count(residue)[source]#
Get the total number of bonds for a residue.
- molr.constants.residue_bonds.has_aromatic_bonds(residue)[source]#
Check if a residue has aromatic bonds.
Standard topology templates for common residues.
Available Templates:
Amino acid topologies
Nucleotide topologies
Common ligand templates
Metal coordination patterns
Configuration#
Configuration for MolR package.
This module provides configuration paths and settings for the MolR package, including paths for CCD data storage.
- molr.config.get_molr_data_dir()[source]#
Get the MolR data directory path.
- Return type:
Path
- Returns:
Path to MolR data directory (~/.molr)
- molr.config.get_ccd_data_path()[source]#
Get the path for CCD data storage.
- Return type:
Path
- Returns:
Path to CCD data directory (~/.molr/ccd-data)
Global configuration settings for MolR.
Configuration Options:
Default bond detection parameters
Spatial indexing settings
I/O parser options
Selection language settings
Type Hints#
MolR provides complete type hint coverage. Key type aliases:
from typing import Union, List, Tuple, Optional
import numpy as np
# Common type aliases used throughout MolR
AtomIndex = int
AtomMask = np.ndarray # Boolean array for atom selection
Coordinates = np.ndarray # Shape (n_atoms, 3)
BondPair = Tuple[AtomIndex, AtomIndex]
SelectionString = str
Usage Examples#
Here are some common usage patterns for the API:
import molr
import numpy as np
# Load and analyze structure
structure = molr.Structure.from_pdb("protein.pdb")
bonds = structure.detect_bonds()
# Selection operations
backbone = structure.select("backbone")
active_site = structure.select("within 5.0 of (resname LIG)")
# Spatial queries
neighbors = structure.get_neighbors_within(100, 5.0)
sphere_atoms = structure.get_atoms_within_sphere([0, 0, 0], 10.0)
# Bond analysis
atom_neighbors = bonds.get_neighbors(100)
connectivity = bonds.to_connectivity_matrix(structure.n_atoms)
# Custom bond detection
from molr.bond_detection import DefaultBondDetector
detector = DefaultBondDetector()
custom_bonds = detector.detect_bonds(structure)
For more detailed examples, see the Examples section.