API Reference

Contents

API Reference#

This section provides detailed documentation for all MolR classes and functions.

Core Classes#

Structure#

class molr.Structure(n_atoms)[source]#

Bases: object

A NumPy-based molecular structure representation using Structure of Arrays (SoA) design.

This class stores molecular data in separate NumPy arrays for efficient vectorized operations and memory usage. Supports optional annotations with lazy initialization to save memory when not needed.

Core annotations (always present):
  • coord: Atomic coordinates (x, y, z) as float64

  • atom_name: PDB atom names as U4 strings

  • element: Element symbols as U2 strings

  • res_name: Residue names as U3 strings

  • res_id: Residue sequence numbers as int32

  • chain_id: Chain identifiers as U1 strings

Optional annotations (lazy initialization):
  • alt_loc: Alternate location indicators

  • occupancy: Occupancy values

  • b_factor: Temperature factors

  • charge: Formal charges

  • serial: Atom serial numbers

  • insertion_code: Residue insertion codes

  • segment_id: Segment identifiers

Classification flags (computed on demand):
  • is_backbone: Boolean array for backbone atoms

  • is_sidechain: Boolean array for sidechain atoms

  • is_aromatic: Boolean array for aromatic atoms

  • is_ligand: Boolean array for ligand atoms

  • residue_type: Residue type classification

Example

>>> structure = Structure(n_atoms=100)
>>> structure.coord = np.random.rand(100, 3)
>>> structure.atom_name[:] = "CA"
>>> structure.add_annotation("custom_prop", dtype=np.float32, default_value=1.0)
Parameters:

n_atoms (int)

__init__(n_atoms)[source]#

Initialize Structure with core annotations only.

Parameters:

n_atoms (int) – Number of atoms in the structure

Raises:

ValueError – If n_atoms <= 0

property coord: ndarray#

Atomic coordinates array (n_atoms, 3).

Returns:

NumPy array of atomic coordinates

add_annotation(name, dtype=<class 'numpy.float32'>, default_value=None)[source]#

Add custom annotation to structure.

Parameters:
  • name (str) – Name of the annotation

  • dtype (Any, default: <class 'numpy.float32'>) – NumPy data type for the annotation

  • default_value (Any, default: None) – Default value to fill the array (optional)

Raises:

ValueError – If annotation name already exists

Return type:

None

property is_backbone: ndarray#

Boolean array indicating backbone atoms.

property is_sidechain: ndarray#

Boolean array indicating sidechain atoms.

property is_aromatic: ndarray#

Boolean array indicating aromatic atoms.

property is_ligand: ndarray#

Boolean array indicating ligand atoms.

property residue_type: ndarray#

String array with residue type classification (PROTEIN/DNA/RNA/LIGAND).

__getitem__(index)[source]#

Get subset of structure by index.

Parameters:

index (Union[int, slice, ndarray]) – Integer, slice, or boolean/integer array for indexing

Return type:

Structure

Returns:

New Structure containing selected atoms

Example

>>> subset = structure[structure.element == "C"]
>>> single_atom = structure[0]
>>> chain_a = structure[structure.chain_id == "A"]
__len__()[source]#

Return number of atoms in structure.

Return type:

int

copy()[source]#

Create deep copy of structure.

Return type:

Structure

Returns:

New Structure with copied data

get_center(weights=None)[source]#

Calculate geometric or mass-weighted center of structure.

Parameters:

weights (Optional[ndarray], default: None) – Optional weights for each atom (e.g., atomic masses)

Return type:

ndarray

Returns:

3D coordinate of center as numpy array

get_masses()[source]#

Get atomic masses for all atoms.

Return type:

ndarray

Returns:

Array of atomic masses in amu

translate(vector)[source]#

Translate structure by given vector.

Parameters:

vector (ndarray) – 3D translation vector

Return type:

None

center_at_origin(weights=None)[source]#

Center structure at origin.

Parameters:

weights (Optional[ndarray], default: None) – Optional weights for center calculation

Return type:

None

get_neighbors_within(atom_idx, radius)[source]#

Get atom indices within radius of specified atom.

Parameters:
  • atom_idx (int) – Index of query atom

  • radius (float) – Search radius in Angstroms

Return type:

ndarray

Returns:

Array of neighbor atom indices (excluding query atom)

Example

>>> neighbors = structure.get_neighbors_within(100, 5.0)
get_atoms_within_sphere(center, radius)[source]#

Get atoms within spherical region.

Parameters:
  • center (ndarray) – Center point of sphere (x, y, z)

  • radius (float) – Radius of sphere in Angstroms

Return type:

ndarray

Returns:

Array of atom indices within the sphere

Example

>>> center = np.array([10.0, 15.0, 20.0])
>>> atoms = structure.get_atoms_within_sphere(center, 8.0)
get_atoms_within_cog_sphere(selection, radius)[source]#

Get atoms within spherical zone centered at center of geometry of selection.

Parameters:
  • selection (ndarray) – Boolean mask or indices of atoms to define COG

  • radius (float) – Radius of spherical zone in Angstroms

Return type:

ndarray

Returns:

Array of atom indices within the COG sphere

Example

>>> active_site = structure.select("resname HIS")
>>> nearby = structure.get_atoms_within_cog_sphere(active_site, 10.0)
get_neighbors_for_atoms(atom_indices, radius)[source]#

Get neighbors for multiple atoms at once (batch operation).

Parameters:
  • atom_indices (ndarray) – Array of atom indices to query

  • radius (float) – Search radius in Angstroms

Return type:

Dict[int, ndarray]

Returns:

Dictionary mapping atom_idx -> array of neighbor indices

Example

>>> ca_atoms = structure.select("name CA")
>>> neighbors = structure.get_neighbors_for_atoms(ca_atoms, 8.0)
get_closest_atoms(query_point, k=1)[source]#

Get k nearest atoms to a query point.

Parameters:
  • query_point (ndarray) – 3D coordinate to query

  • k (int, default: 1) – Number of nearest neighbors to return

Return type:

Tuple[ndarray, ndarray]

Returns:

Tuple of (distances, atom_indices) for k nearest atoms

Example

>>> center = np.array([0.0, 0.0, 0.0])
>>> distances, indices = structure.get_closest_atoms(center, k=5)
get_atoms_between_selections(selection1, selection2, max_distance)[source]#

Find atoms from two selections within max_distance of each other.

Parameters:
  • selection1 (ndarray) – First selection (boolean mask or indices)

  • selection2 (ndarray) – Second selection (boolean mask or indices)

  • max_distance (float) – Maximum distance between selections

Return type:

Dict[str, ndarray]

Returns:

Dictionary with ‘selection1_atoms’, ‘selection2_atoms’, ‘distances’

Example

>>> protein = structure.select("protein")
>>> ligand = structure.select("resname LIG")
>>> contacts = structure.get_atoms_between_selections(protein, ligand, 5.0)
has_spatial_index()[source]#

Check if spatial index is available.

Return type:

bool

Returns:

True if scipy is available and spatial indexing is possible

get_bonds_to(other_atoms, max_distance=2.0)[source]#

Find potential bonds to other atoms based on distance.

Parameters:
  • other_atoms (ndarray) – Indices of other atoms to check

  • max_distance (float, default: 2.0) – Maximum distance for bond consideration

Return type:

ndarray

Returns:

Boolean array indicating which atoms have potential bonds

select(selection_string)[source]#

Select atoms using selection language.

Parameters:

selection_string (str) – Selection expression

Return type:

ndarray

Returns:

Boolean array of selected atoms

Examples

>>> mask = structure.select("protein and backbone")
>>> mask = structure.select("resname ALA GLY")
>>> mask = structure.select("chain A and resid 1:50")
Raises:

NotImplementedError – For unsupported selection syntax

get_annotation_info()[source]#

Get information about all annotations in the structure.

Return type:

Dict[str, Dict[str, Any]]

Returns:

Dictionary with annotation info including dtype and whether it’s initialized

__repr__()[source]#

String representation of Structure.

Return type:

str

__str__()[source]#

Detailed string representation.

Return type:

str

classmethod from_pdb(filename)[source]#

Create Structure from PDB file.

Parameters:

filename (str) – Path to PDB file

Return type:

Structure

Returns:

Structure object with all atoms and annotations

Raises:

ValueError – If PDB file contains multiple models

Example

>>> structure = Structure.from_pdb("example.pdb")
>>> print(f"Loaded {structure.n_atoms} atoms")
classmethod from_mmcif(filename)[source]#

Create Structure from mmCIF file.

Parameters:

filename (str) – Path to mmCIF file

Return type:

Structure

Returns:

Structure object with all atoms and annotations

Raises:

ValueError – If mmCIF file contains multiple models

Example

>>> structure = Structure.from_mmcif("example.cif")
>>> print(f"Loaded {structure.n_atoms} atoms")
classmethod from_pdb_string(pdb_content)[source]#

Create Structure from PDB content string.

Parameters:

pdb_content (str) – PDB file content as string

Return type:

Structure

Returns:

Structure object with all atoms and annotations

Raises:

ValueError – If PDB content contains multiple models

Example

>>> pdb_data = "ATOM      1  N   ALA A   1      20.154  16.967  22.478  1.00 10.00           N"
>>> structure = Structure.from_pdb_string(pdb_data)
classmethod from_mmcif_string(mmcif_content)[source]#

Create Structure from mmCIF content string.

Parameters:

mmcif_content (str) – mmCIF file content as string

Return type:

Structure

Returns:

Structure object with all atoms and annotations

Raises:

ValueError – If mmCIF content contains multiple models

Example

>>> mmcif_data = "data_test\nloop_\n_atom_site.group_PDB\n..."
>>> structure = Structure.from_mmcif_string(mmcif_data)
detect_bonds(vdw_factor=0.75, use_file_bonds=True, store_bonds=True)[source]#

Detect bonds using the simplified default detector.

Parameters:
  • vdw_factor (float, default: 0.75) – Factor for VdW radii in distance detection (0.0 < factor <= 1.0)

  • use_file_bonds (bool, default: True) – Whether to include file-based bonds (CONECT, mmCIF)

  • store_bonds (bool, default: True) – Whether to store detected bonds on structure

Return type:

molr.BondList

Returns:

BondList with detected bonds

Example

>>> structure = Structure.from_pdb("protein.pdb")
>>> bonds = structure.detect_bonds()
>>> print(f"Detected {len(bonds)} bonds")
property bonds: TypeAliasForwardRef('molr.BondList') | None#

Get bonds associated with this structure.

Returns:

BondList if bonds have been detected/assigned, None otherwise

has_bonds()[source]#

Check if structure has bond information.

Return type:

bool

Returns:

True if bonds are available

property file_bonds: TypeAliasForwardRef('molr.BondList') | None#

Get bonds loaded from structure files (PDB CONECT, mmCIF bonds).

Returns:

BondList with file-based bonds or None if not available

has_file_bonds()[source]#

Check if structure has file-based bond information.

Return type:

bool

Returns:

True if file bonds are available

The main class for representing molecular structures with spatial indexing capabilities.

Key Methods:

StructureEnsemble#

class molr.StructureEnsemble(template, n_frames=0)[source]#

Bases: object

Ensemble of molecular structures representing trajectory data.

This class stores multiple frames of coordinate data while sharing annotations (atom names, elements, etc.) across all frames for memory efficiency. Designed for trajectory analysis and multi-model PDB files.

Memory layout:
  • coords: (n_frames, n_atoms, 3) array of coordinates

  • Annotations shared from template Structure

  • Optional time and box information per frame

Example

>>> ensemble = StructureEnsemble.from_structures([struct1, struct2])
>>> print(f"Trajectory with {ensemble.n_frames} frames")
>>> frame0 = ensemble[0]  # Returns Structure for frame 0
Parameters:
__init__(template, n_frames=0)[source]#

Initialize StructureEnsemble from template Structure.

Parameters:
  • template (Structure) – Template Structure with shared annotations

  • n_frames (int, default: 0) – Number of frames (default: 0 for dynamic growth)

classmethod from_pdb(filename)[source]#

Create StructureEnsemble from multi-model PDB file.

Parameters:

filename (str) – Path to multi-model PDB file

Return type:

StructureEnsemble

Returns:

StructureEnsemble with all models as frames

Raises:

ValueError – If PDB file contains only single model

classmethod from_pdb_string(pdb_content)[source]#

Create StructureEnsemble from multi-model PDB content string.

Parameters:

pdb_content (str) – PDB content string with multiple models

Return type:

StructureEnsemble

Returns:

StructureEnsemble with all models as frames

Raises:

ValueError – If PDB content contains only single model

classmethod from_mmcif(filename)[source]#

Create StructureEnsemble from multi-model mmCIF file.

Parameters:

filename (str) – Path to multi-model mmCIF file

Return type:

StructureEnsemble

Returns:

StructureEnsemble with all models as frames

Raises:

ValueError – If mmCIF file contains only single model

classmethod from_mmcif_string(mmcif_content)[source]#

Create StructureEnsemble from multi-model mmCIF content string.

Parameters:

mmcif_content (str) – mmCIF content string with multiple models

Return type:

StructureEnsemble

Returns:

StructureEnsemble with all models as frames

Raises:

ValueError – If mmCIF content contains only single model

classmethod from_structures(structures)[source]#

Create StructureEnsemble from list of Structure objects.

Parameters:

structures (List[Structure]) – List of Structure objects with same atoms

Return type:

StructureEnsemble

Returns:

StructureEnsemble with structures as frames

Raises:

ValueError – If structures have different atom counts

add_frame(structure, time=None)[source]#

Add a new frame to the ensemble.

Parameters:
  • structure (Structure) – Structure to add as new frame

  • time (Optional[float], default: None) – Optional time value for this frame

Raises:

ValueError – If structure atom count doesn’t match

Return type:

None

__getitem__(index)[source]#

Get frame(s) from ensemble.

Parameters:

index (Union[int, slice]) – Frame index or slice

Return type:

Union[Structure, StructureEnsemble]

Returns:

Structure for single frame, StructureEnsemble for slice

Examples

>>> frame0 = ensemble[0]  # Single frame as Structure
>>> sub_traj = ensemble[10:20]  # Sub-trajectory as StructureEnsemble
__len__()[source]#

Return number of frames.

Return type:

int

__iter__()[source]#

Iterate over frames as Structure objects.

Return type:

Any

get_frame_coords(frame_index)[source]#

Get coordinates for specific frame.

Parameters:

frame_index (int) – Index of frame

Return type:

ndarray

Returns:

Coordinate array (n_atoms, 3) for the frame

set_frame_coords(frame_index, coords)[source]#

Set coordinates for specific frame.

Parameters:
  • frame_index (int) – Index of frame

  • coords (ndarray) – Coordinate array (n_atoms, 3)

Return type:

None

center_frames(selection=None)[source]#

Center all frames at origin.

Parameters:

selection (Optional[str], default: None) – Optional selection for center calculation (default: all atoms)

Return type:

None

rmsd(reference_frame=0, selection=None)[source]#

Calculate RMSD of each frame relative to reference.

Parameters:
  • reference_frame (int, default: 0) – Index of reference frame

  • selection (Optional[str], default: None) – Optional atom selection for RMSD calculation

Return type:

ndarray

Returns:

Array of RMSD values for each frame

__repr__()[source]#

String representation.

Return type:

str

__str__()[source]#

Detailed string representation.

Return type:

str

Multi-model trajectory representation for handling structural ensembles.

Key Methods:

BondList#

class molr.BondList(n_bonds=0)[source]#

Bases: object

Efficient storage and manipulation of molecular bonds with smart indexing.

The BondList class stores bonds as pairs of atom indices with additional metadata such as bond order, detection method, and confidence scores. It supports smart indexing that automatically adjusts bond indices when the parent structure is sliced or modified.

Bond storage uses Structure of Arrays (SoA) design:
  • bonds: (N, 2) array of atom index pairs

  • bond_order: Bond order (1=single, 2=double, 3=triple, 1.5=aromatic)

  • bond_type: Bond type classification

  • detection_method: How the bond was detected

  • confidence: Confidence score for bond existence

Smart indexing features:
  • Automatic bond index adjustment when structure is sliced

  • Efficient bond filtering based on atom selections

  • Bond validation against structure changes

Example

>>> bond_list = BondList()
>>> bond_list.add_bond(0, 1, bond_order=1.0, bond_type="covalent")
>>> bond_list.add_bonds([(2, 3), (3, 4)], bond_orders=[1.0, 2.0])
>>> subset_bonds = bond_list.filter_by_atoms([0, 1, 2])
Parameters:

n_bonds (int, default: 0)

__init__(n_bonds=0)[source]#

Initialize BondList.

Parameters:

n_bonds (int, default: 0) – Initial number of bonds (default: 0 for dynamic growth)

add_property(name, dtype=<class 'numpy.float32'>, default_value=None)[source]#

Add custom property to bonds.

Parameters:
  • name (str) – Name of the property

  • dtype (Any, default: <class 'numpy.float32'>) – NumPy data type for the property

  • default_value (Any, default: None) – Default value to fill existing bonds

Raises:

ValueError – If property name already exists

Return type:

None

add_bond(atom1, atom2, bond_order=1.0, bond_type='covalent', **kwargs)[source]#

Add a single bond.

Parameters:
  • atom1 (int) – Index of first atom

  • atom2 (int) – Index of second atom

  • bond_order (float, default: 1.0) – Bond order (1.0=single, 2.0=double, etc.)

  • bond_type (str, default: 'covalent') – Type of bond

  • **kwargs (Any) – Additional bond properties

Return type:

int

Returns:

Index of the added bond

Raises:

ValueError – If atoms are the same or invalid

add_bonds(bond_pairs, bond_orders=None, bond_types=None, **kwargs)[source]#

Add multiple bonds efficiently.

Parameters:
  • bond_pairs (List[Tuple[int, int]]) – List of (atom1, atom2) tuples

  • bond_orders (Optional[List[float]], default: None) – Optional list of bond orders (default: all 1.0)

  • bond_types (Optional[List[str]], default: None) – Optional list of bond types (default: all “covalent”)

  • **kwargs (Any) – Additional properties as lists

Return type:

ndarray

Returns:

Array of bond indices for added bonds

Raises:

ValueError – If list lengths don’t match

remove_bonds(bond_indices)[source]#

Remove bonds by index.

Parameters:

bond_indices (Union[int, List[int], ndarray[Any, Any]]) – Bond index or array of bond indices to remove

Return type:

None

get_bonds_for_atom(atom_index)[source]#

Get all bonds involving a specific atom.

Parameters:

atom_index (int) – Index of the atom

Return type:

ndarray

Returns:

Array of bond indices involving the atom

get_neighbors(atom_index)[source]#

Get neighbor atoms for a specific atom.

Parameters:

atom_index (int) – Index of the atom

Return type:

ndarray

Returns:

Array of neighbor atom indices

filter_by_atoms(atom_indices)[source]#

Create new BondList containing only bonds between specified atoms.

Parameters:

atom_indices (Union[List[int], ndarray]) – List or array of atom indices to keep

Return type:

molr.BondList

Returns:

New BondList with filtered bonds and remapped indices

get_bond_matrix(n_atoms)[source]#

Create bond adjacency matrix.

Parameters:

n_atoms (int) – Total number of atoms in structure

Return type:

ndarray

Returns:

(n_atoms, n_atoms) boolean adjacency matrix

validate_bonds(n_atoms)[source]#

Validate that all bonds reference valid atom indices.

Parameters:

n_atoms (int) – Number of atoms in the structure

Return type:

Tuple[bool, List[int]]

Returns:

Tuple of (all_valid, list_of_invalid_bond_indices)

__len__()[source]#

Return number of bonds.

Return type:

int

__getitem__(index)[source]#

Get bond(s) by index.

Parameters:

index (Union[int, slice, ndarray]) – Integer, slice, or array for indexing

Return type:

Union[Tuple[int, int], molr.BondList]

Returns:

Single bond tuple or new BondList with selected bonds

__repr__()[source]#

String representation of BondList.

Return type:

str

__str__()[source]#

Detailed string representation.

Return type:

str

Efficient storage and manipulation of molecular bonds.

Key Methods:

  • get_bond() - Get bond between atoms

  • get_neighbors() - Get bonded neighbors

  • to_connectivity_matrix() - Convert to adjacency matrix

Bond Detection#

DefaultBondDetector#

class molr.bond_detection.DefaultBondDetector(vdw_factor=0.75)[source]#

Bases: object

Simplified bond detector using templates and distance criteria.

This replaces the complex hierarchical system with a straightforward approach: 1. Apply residue templates (from residue_bonds.py or CCD) 2. Apply distance-based detection as fallback

Parameters:

vdw_factor (float, default: 0.75)

__init__(vdw_factor=0.75)[source]#

Initialize the default bond detector.

Parameters:

vdw_factor (float, default: 0.75) – Factor for Van der Waals radii in distance detection (0.0 < factor <= 1.0). Default 0.75 works well for most cases.

detect_bonds(structure, use_file_bonds=True)[source]#

Detect bonds in a molecular structure.

Parameters:
  • structure (Structure) – Structure to analyze

  • use_file_bonds (bool, default: True) – Whether to include file-based bonds (CONECT, etc.)

Return type:

BondList

Returns:

BondList containing all detected bonds

Default bond detector that combines residue templates and distance-based detection.

Bond Detection Functions#

molr.bond_detection.detect_bonds(structure, vdw_factor=0.75, use_file_bonds=True)[source]#

Convenience function to detect bonds in a structure.

Parameters:
  • structure (Structure) – Structure to analyze

  • vdw_factor (float, default: 0.75) – Factor for VdW radii in distance detection

  • use_file_bonds (bool, default: True) – Whether to include file-based bonds

Return type:

BondList

Returns:

BondList with detected bonds

Main function for bond detection in molecular structures.

I/O Parsers#

PDB Parser#

class molr.PDBParser[source]#

Bases: object

PDB file parser for the space module.

Designed specifically for the NumPy-based Structure class, this parser converts pdbreader output directly to NumPy arrays for optimal performance.

Features: - Direct conversion to NumPy arrays - CONECT record parsing for explicit bonds - Multi-model support for trajectories - Efficient memory usage - Full PDB annotation support

__init__()[source]#

Initialize the PDB parser.

parse_file(filename)[source]#

Parse a PDB file and return a Structure.

Parameters:

filename (str) – Path to the PDB file

Return type:

Union[Structure, StructureEnsemble]

Returns:

Structure object with all atoms and annotations

Raises:
parse_string(pdb_content)[source]#

Parse PDB content from a string.

Parameters:

pdb_content (str) – PDB file content as string

Return type:

Union[Structure, StructureEnsemble]

Returns:

Structure object with all atoms and annotations

Parser for PDB format files with support for:

  • Multi-model structures

  • CONECT record parsing

  • Alternate conformations

  • Insertion codes

  • Crystal information

mmCIF Parser#

class molr.mmCIFParser[source]#

Bases: object

mmCIF file parser for the space module.

Designed specifically for the NumPy-based Structure class, this parser converts mmcif output directly to NumPy arrays for optimal performance.

Features: - Direct conversion to NumPy arrays - Multi-model support for trajectories - Efficient memory usage - Full mmCIF annotation support - Chemical bond information from mmCIF data

__init__()[source]#

Initialize the mmCIF parser.

parse_file(filename)[source]#

Parse an mmCIF file and return a Structure or StructureEnsemble.

Parameters:

filename (str) – Path to the mmCIF file

Return type:

Union[Structure, StructureEnsemble]

Returns:

Structure object for single model, StructureEnsemble for multi-model

Raises:
parse_string(mmcif_content)[source]#

Parse mmCIF content from a string.

Parameters:

mmcif_content (str) – mmCIF file content as string

Return type:

Union[Structure, StructureEnsemble]

Returns:

Structure object with all atoms and annotations

Parser for mmCIF format files with support for:

  • Chemical bond information

  • Large structure handling

  • Complete metadata extraction

Selection System#

Selection Engine#

class molr.selection.SelectionEngine(cache_size=100)[source]#

Bases: object

Engine for evaluating atom selections on structures.

Provides caching and optimization for repeated selections.

Parameters:

cache_size (int, default: 100)

__init__(cache_size=100)[source]#

Initialize selection engine.

Parameters:

cache_size (int, default: 100) – Maximum number of cached selections

select(structure, selection)[source]#

Select atoms from a structure.

Parameters:
Return type:

ndarray

Returns:

Boolean array indicating selected atoms

Raises:

ParseException – If selection string is invalid

select_atoms(structure, selection)[source]#

Return a new Structure containing only selected atoms.

Parameters:
Return type:

Structure

Returns:

New Structure with selected atoms

count(structure, selection)[source]#

Count atoms matching selection.

Parameters:
Return type:

int

Returns:

Number of selected atoms

get_indices(structure, selection)[source]#

Get indices of atoms matching selection.

Parameters:
Return type:

ndarray

Returns:

Array of atom indices

clear_cache()[source]#

Clear the selection cache.

Return type:

None

Main engine for parsing and evaluating selection expressions.

Selection Functions#

molr.selection.select(structure, selection)[source]#

Select atoms from a structure.

Parameters:
Return type:

ndarray

Returns:

Boolean array indicating selected atoms

Main selection function for atom queries.

molr.selection.select_atoms(structure, selection)[source]#

Return a new Structure containing only selected atoms.

Parameters:
Return type:

Structure

Returns:

New Structure with selected atoms

Alternative selection function.

Selection Parser#

class molr.selection.SelectionParser[source]#

Bases: object

Parser for atom selection language.

Supports syntax like:
  • “protein and backbone”

  • “resname ALA GLY”

  • “chain A and resid 1:100”

  • “element C N O”

  • “not water”

  • “(protein and chain A) or ligand”

  • “byres (ligand and within 5 of protein)”

__init__()[source]#

Initialize the parser with grammar rules.

parse(selection_string)[source]#

Parse a selection string into a SelectionExpression.

Parameters:

selection_string (str) – The selection string to parse

Return type:

SelectionExpression

Returns:

SelectionExpression object

Raises:

ParseException – If the string cannot be parsed

classmethod parse_selection(selection_string)[source]#

Convenience class method to parse a selection string.

Parameters:

selection_string (str) – The selection string to parse

Return type:

SelectionExpression

Returns:

SelectionExpression object

pyparsing-based parser for selection language syntax.

Supported Expressions:

  • Atom properties: name, element, resname, chain

  • Spatial queries: within, around, cog

  • Boolean operations: and, or, not

  • Residue modifiers: byres

  • Predefined groups: protein, backbone, sidechain

Expression Classes#

Base Expression#

class molr.selection.SelectionExpression[source]#

Bases: ABC

Abstract base class for all selection expressions.

Selection expressions form a tree structure that can be evaluated against a Structure to produce a boolean mask indicating which atoms are selected.

abstractmethod evaluate(structure)[source]#

Evaluate the expression against a structure.

Parameters:

structure (Structure) – The molecular structure to evaluate against

Return type:

ndarray

Returns:

Boolean array with True for selected atoms

__and__(other)[source]#

Create AND expression using & operator.

Parameters:

other (SelectionExpression)

Return type:

SelectionExpression

__or__(other)[source]#

Create OR expression using | operator.

Parameters:

other (SelectionExpression)

Return type:

SelectionExpression

__invert__()[source]#

Create NOT expression using ~ operator.

Return type:

SelectionExpression

abstractmethod __repr__()[source]#

String representation of the expression.

Return type:

str

Atom Property Expressions#

class molr.selection.ElementExpression(elements)[source]#

Bases: SelectionExpression

Select atoms by element type.

Parameters:

elements (Union[str, List[str]])

__init__(elements)[source]#

Initialize element selection.

Parameters:

elements (Union[str, List[str]]) – Element symbol(s) to select

evaluate(structure)[source]#

Select atoms matching the specified elements.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.AtomNameExpression(names)[source]#

Bases: SelectionExpression

Select atoms by atom name.

Parameters:

names (Union[str, List[str]])

__init__(names)[source]#

Initialize atom name selection.

Parameters:

names (Union[str, List[str]]) – Atom name(s) to select

evaluate(structure)[source]#

Select atoms matching the specified names.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.ResidueNameExpression(resnames)[source]#

Bases: SelectionExpression

Select atoms by residue name.

Parameters:

resnames (Union[str, List[str]])

__init__(resnames)[source]#

Initialize residue name selection.

Parameters:

resnames (Union[str, List[str]]) – Residue name(s) to select

evaluate(structure)[source]#

Select atoms in residues matching the specified names.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.ResidueIdExpression(resids)[source]#

Bases: SelectionExpression

Select atoms by residue ID.

Parameters:

resids (Union[int, List[int], range])

__init__(resids)[source]#

Initialize residue ID selection.

Parameters:

resids (Union[int, List[int], range]) – Residue ID(s) to select

evaluate(structure)[source]#

Select atoms in residues matching the specified IDs.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.ChainExpression(chains)[source]#

Bases: SelectionExpression

Select atoms by chain ID.

Parameters:

chains (Union[str, List[str]])

__init__(chains)[source]#

Initialize chain selection.

Parameters:

chains (Union[str, List[str]]) – Chain ID(s) to select

evaluate(structure)[source]#

Select atoms in the specified chains.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.IndexExpression(indices)[source]#

Bases: SelectionExpression

Select atoms by index.

Parameters:

indices (Union[int, List[int], range, slice])

__init__(indices)[source]#

Initialize index selection.

Parameters:

indices (Union[int, List[int], range, slice]) – Atom indices to select

evaluate(structure)[source]#

Select atoms at the specified indices.

Parameters:

structure (Structure)

Return type:

ndarray

Structural Expressions#

class molr.selection.BackboneExpression[source]#

Bases: SelectionExpression

Select backbone atoms.

evaluate(structure)[source]#

Select atoms that are part of the backbone.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.SidechainExpression[source]#

Bases: SelectionExpression

Select sidechain atoms.

evaluate(structure)[source]#

Select atoms that are part of sidechains.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.ProteinExpression[source]#

Bases: SelectionExpression

Select protein atoms.

evaluate(structure)[source]#

Select atoms that are part of protein residues.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.NucleicExpression[source]#

Bases: SelectionExpression

Select nucleic acid atoms.

evaluate(structure)[source]#

Select atoms that are part of DNA or RNA.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.DNAExpression[source]#

Bases: SelectionExpression

Select DNA atoms.

evaluate(structure)[source]#

Select atoms that are part of DNA.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.RNAExpression[source]#

Bases: SelectionExpression

Select RNA atoms.

evaluate(structure)[source]#

Select atoms that are part of RNA.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.LigandExpression[source]#

Bases: SelectionExpression

Select ligand atoms.

evaluate(structure)[source]#

Select atoms that are part of ligands.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.AromaticExpression[source]#

Bases: SelectionExpression

Select aromatic atoms.

evaluate(structure)[source]#

Select atoms that are part of aromatic systems.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.WaterExpression[source]#

Bases: SelectionExpression

Select water molecules.

evaluate(structure)[source]#

Select atoms that are part of water molecules.

Parameters:

structure (Structure)

Return type:

ndarray

Boolean Expressions#

class molr.selection.AndExpression(left, right)[source]#

Bases: SelectionExpression

Logical AND of two expressions.

Parameters:
__init__(left, right)[source]#

Initialize AND expression.

Parameters:
evaluate(structure)[source]#

Return atoms selected by both expressions.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.OrExpression(left, right)[source]#

Bases: SelectionExpression

Logical OR of two expressions.

Parameters:
__init__(left, right)[source]#

Initialize OR expression.

Parameters:
evaluate(structure)[source]#

Return atoms selected by either expression.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.NotExpression(operand)[source]#

Bases: SelectionExpression

Logical NOT of an expression.

Parameters:

operand (SelectionExpression)

__init__(operand)[source]#

Initialize NOT expression.

Parameters:

operand (SelectionExpression) – Expression to negate

evaluate(structure)[source]#

Return atoms not selected by the expression.

Parameters:

structure (Structure)

Return type:

ndarray

Special Expressions#

class molr.selection.AllExpression[source]#

Bases: SelectionExpression

Select all atoms.

evaluate(structure)[source]#

Return True for all atoms.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.NoneExpression[source]#

Bases: SelectionExpression

Select no atoms.

evaluate(structure)[source]#

Return False for all atoms.

Parameters:

structure (Structure)

Return type:

ndarray

class molr.selection.ByResidueExpression(atom_selection)[source]#

Bases: SelectionExpression

Select complete residues based on atom selection.

Parameters:

atom_selection (SelectionExpression)

__init__(atom_selection)[source]#

Initialize by-residue selection.

Parameters:

atom_selection (SelectionExpression) – Expression to identify residues

evaluate(structure)[source]#

Select all atoms in residues that have any selected atoms.

Parameters:

structure (Structure)

Return type:

ndarray

Utilities#

Atom Utilities#

Atom Utilities

This module contains utility functions for working with PDB atoms and elements.

molr.utilities.atom_utils.get_element_from_pdb_atom(atom_name)[source]#

Map PDB atom name to chemical element using regex patterns.

This function uses regular expressions to identify the element type from PDB atom naming conventions, handling complex cases like: - Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH) - Numbered variants (C1’, H2’’, OP1, etc.) - Ion charges (CA2+, MG2+, etc.) - IUPAC hydrogen naming conventions

Parameters:

atom_name (str) – PDB atom name (e.g., ‘CA’, ‘OP1’, ‘H2’’, ‘CA2+’)

Returns:

Chemical element symbol (e.g., ‘C’, ‘O’, ‘H’, ‘CA’)

Return type:

str

Examples

>>> get_element_from_pdb_atom('CA')
'C'
>>> get_element_from_pdb_atom('OP1')
'O'
>>> get_element_from_pdb_atom('CA2+')
'CA'
>>> get_element_from_pdb_atom('H2'')
'H'
molr.utilities.atom_utils.pdb_atom_to_element(atom_name)[source]#

High-performance mapping of PDB atom name to chemical element.

Uses a pre-computed dictionary for common atoms and falls back to regex-based pattern matching for less common cases.

Parameters:

atom_name (str) – PDB atom name

Returns:

Chemical element symbol

Return type:

str

Key Functions:

  • classify_atom_types() - Classify backbone/sidechain

  • calculate_center_of_geometry() - COG calculation

  • get_vdw_radius() - Van der Waals radii lookup

Constants#

Atomic Data#

Atomic Data Constants

This module contains atomic properties and constants for all elements commonly found in protein, DNA, RNA, and water molecules in PDB structures.

class molr.constants.atomic_data.AtomicData[source]#

Atomic properties and constants.

This class contains atomic data for all elements commonly found in protein, DNA, RNA, and water molecules in PDB structures.

COVALENT_RADII = {'BR': 1.14, 'C': 0.76, 'CA': 1.76, 'CL': 0.99, 'CO': 1.26, 'CU': 1.32, 'D': 0.31, 'F': 0.57, 'FE': 1.32, 'H': 0.31, 'I': 1.33, 'K': 2.03, 'MG': 1.41, 'MN': 1.39, 'N': 0.71, 'NA': 1.66, 'NI': 1.24, 'O': 0.66, 'P': 1.07, 'S': 1.05, 'ZN': 1.22}#
VDW_RADII = {'AL': 1.84, 'AR': 1.88, 'AS': 1.85, 'AT': 2.02, 'AU': 2.1, 'B': 1.92, 'BA': 2.68, 'BE': 1.53, 'BI': 2.07, 'BR': 1.83, 'C': 1.7, 'CA': 2.31, 'CL': 1.75, 'CO': 2.0, 'CS': 3.43, 'CU': 2.0, 'F': 1.47, 'FE': 2.05, 'FR': 3.48, 'GA': 1.87, 'GE': 2.11, 'H': 1.1, 'HE': 1.4, 'I': 1.98, 'IN': 1.93, 'K': 2.75, 'KR': 2.02, 'LI': 1.81, 'MG': 1.73, 'MN': 2.05, 'MO': 2.1, 'N': 1.55, 'NA': 2.27, 'NE': 1.54, 'NI': 2.0, 'O': 1.52, 'P': 1.8, 'PB': 2.02, 'PO': 1.97, 'PT': 2.05, 'RA': 2.83, 'RB': 3.03, 'RN': 2.2, 'RU': 2.05, 'S': 1.8, 'SB': 2.06, 'SE': 1.9, 'SI': 2.1, 'SN': 2.17, 'SR': 2.49, 'TE': 2.06, 'TL': 1.96, 'W': 2.1, 'XE': 2.16, 'ZN': 2.1}#
ELECTRONEGATIVITY = {'BR': 2.96, 'C': 2.55, 'CA': 1.0, 'CL': 3.16, 'CO': 1.88, 'CU': 1.9, 'D': 2.2, 'F': 3.98, 'FE': 1.83, 'H': 2.2, 'I': 2.66, 'K': 0.82, 'MG': 1.31, 'MN': 1.55, 'N': 3.04, 'NA': 0.93, 'NI': 1.91, 'O': 3.44, 'P': 2.19, 'S': 2.58, 'ZN': 1.65}#
ATOMIC_MASSES = {'BR': 79.904, 'C': 12.011, 'CA': 40.078, 'CL': 35.453, 'CO': 58.933, 'CU': 63.546, 'D': 2.014, 'F': 18.998, 'FE': 55.845, 'H': 1.008, 'I': 126.904, 'K': 39.098, 'MG': 24.305, 'MN': 54.938, 'N': 14.007, 'NA': 22.99, 'NI': 58.693, 'O': 15.999, 'P': 30.974, 'S': 32.065, 'ZN': 65.38}#
DEFAULT_ATOMIC_MASS = 12.011#
MIN_HYDROGEN_RATIO = 0.25#
METAL_ELEMENTS = {'CA', 'CO', 'CU', 'FE', 'K', 'MG', 'MN', 'NA', 'NI', 'ZN'}#

Standard atomic properties and constants.

Available Data:

  • Element symbols and atomic numbers

  • Van der Waals radii

  • Covalent radii

  • Atomic masses

Bond Parameters#

Bond detection parameters and constants.

This module provides constants for bond detection algorithms including distance thresholds, quality assessment parameters, and validation rules.

Bond length and angle parameters for different atom types.

Available Data:

  • Standard bond lengths

  • Bond angle preferences

  • Distance cutoffs for bond detection

PDB Constants#

PDB Structure Constants

This module contains constants specifically related to PDB file processing, including residue mappings, atom classifications, and molecular recognition patterns used throughout MolR’s structure analysis components.

molr.constants.pdb_constants.PROTEIN_SUBSTITUTIONS: Dict[str, str] = {'2AS': 'ASP', '3AH': 'HIS', '5HP': 'GLU', '5OW': 'LYS', 'ACL': 'ARG', 'AGM': 'ARG', 'AIB': 'ALA', 'ALM': 'ALA', 'ALO': 'THR', 'ALY': 'LYS', 'ARM': 'ARG', 'ASA': 'ASP', 'ASB': 'ASP', 'ASK': 'ASP', 'ASL': 'ASP', 'ASQ': 'ASP', 'AYA': 'ALA', 'BCS': 'CYS', 'BHD': 'ASP', 'BMT': 'THR', 'BNN': 'ALA', 'BUC': 'CYS', 'BUG': 'LEU', 'C5C': 'CYS', 'C6C': 'CYS', 'CAS': 'CYS', 'CCS': 'CYS', 'CEA': 'CYS', 'CGU': 'GLU', 'CHG': 'ALA', 'CLE': 'LEU', 'CME': 'CYS', 'CSD': 'ALA', 'CSO': 'CYS', 'CSP': 'CYS', 'CSS': 'CYS', 'CSW': 'CYS', 'CSX': 'CYS', 'CXM': 'MET', 'CY1': 'CYS', 'CY3': 'CYS', 'CYG': 'CYS', 'CYM': 'CYS', 'CYQ': 'CYS', 'DAH': 'PHE', 'DAL': 'ALA', 'DAR': 'ARG', 'DAS': 'ASP', 'DCY': 'CYS', 'DGL': 'GLU', 'DGN': 'GLN', 'DHA': 'ALA', 'DHI': 'HIS', 'DIL': 'ILE', 'DIV': 'VAL', 'DLE': 'LEU', 'DLY': 'LYS', 'DNP': 'ALA', 'DPN': 'PHE', 'DPR': 'PRO', 'DSN': 'SER', 'DSP': 'ASP', 'DTH': 'THR', 'DTR': 'TRP', 'DTY': 'TYR', 'DVA': 'VAL', 'EFC': 'CYS', 'FLA': 'ALA', 'FME': 'MET', 'GGL': 'GLU', 'GL3': 'GLY', 'GLZ': 'GLY', 'GMA': 'GLU', 'GSC': 'GLY', 'HAC': 'ALA', 'HAR': 'ARG', 'HIC': 'HIS', 'HIP': 'HIS', 'HMR': 'ARG', 'HPQ': 'PHE', 'HTR': 'TRP', 'HYP': 'PRO', 'IAS': 'ASP', 'IIL': 'ILE', 'IYR': 'TYR', 'KCX': 'LYS', 'LLP': 'LYS', 'LLY': 'LYS', 'LTR': 'TRP', 'LYM': 'LYS', 'LYZ': 'LYS', 'MAA': 'ALA', 'MEN': 'ASN', 'MHS': 'HIS', 'MIS': 'SER', 'MK8': 'LEU', 'MLE': 'LEU', 'MPQ': 'GLY', 'MSA': 'GLY', 'MSE': 'MET', 'MVA': 'VAL', 'NEM': 'HIS', 'NEP': 'HIS', 'NLE': 'LEU', 'NLN': 'LEU', 'NLP': 'LEU', 'NMC': 'GLY', 'OAS': 'SER', 'OCS': 'CYS', 'OMT': 'MET', 'PAQ': 'TYR', 'PCA': 'GLU', 'PEC': 'CYS', 'PHI': 'PHE', 'PHL': 'PHE', 'PR3': 'CYS', 'PRR': 'ALA', 'PTR': 'TYR', 'PYX': 'CYS', 'SAC': 'SER', 'SAR': 'GLY', 'SCH': 'CYS', 'SCS': 'CYS', 'SCY': 'CYS', 'SEL': 'SER', 'SEP': 'SER', 'SET': 'SER', 'SHC': 'CYS', 'SHR': 'LYS', 'SMC': 'CYS', 'SOC': 'CYS', 'STY': 'TYR', 'SVA': 'SER', 'TIH': 'ALA', 'TPL': 'TRP', 'TPO': 'THR', 'TPQ': 'ALA', 'TRG': 'LYS', 'TRO': 'TRP', 'TYB': 'TYR', 'TYI': 'TYR', 'TYQ': 'TYR', 'TYS': 'TYR', 'TYY': 'TYR'}#

Mapping of non-standard protein residue codes to their standard amino acid equivalents.

This comprehensive dictionary provides substitutions for modified, methylated, phosphorylated, and other chemically altered amino acid residues commonly found in PDB structures. Used by PDB fixing operations to standardize protein residue names for consistent analysis.

Examples

  • MSE (selenomethionine) → MET (methionine)

  • CSO (cysteine sulfenic acid) → CYS (cysteine)

  • HYP (hydroxyproline) → PRO (proline)

  • PCA (pyroglutamic acid) → GLU (glutamic acid)

Note: This dictionary contains only protein residue substitutions. Nucleotide modifications are handled separately.

Type:

Dict[str, str]

molr.constants.pdb_constants.PROTEIN_RESIDUES: List[str] = ['ALA', 'ASN', 'CYS', 'GLU', 'HIS', 'LEU', 'MET', 'PRO', 'THR', 'TYR', 'ARG', 'ASP', 'GLN', 'GLY', 'ILE', 'LYS', 'PHE', 'SER', 'TRP', 'VAL']#

Standard three-letter codes for the 20 canonical amino acid residues.

This list contains all naturally occurring protein amino acids in their standard three-letter abbreviation format as used in PDB files. Used for residue type validation, protein chain identification, and analysis scope determination.

The 20 amino acids are:
  • Alanine (ALA), Arginine (ARG), Asparagine (ASN), Aspartic acid (ASP)

  • Cysteine (CYS), Glutamic acid (GLU), Glutamine (GLN), Glycine (GLY)

  • Histidine (HIS), Isoleucine (ILE), Leucine (LEU), Lysine (LYS)

  • Methionine (MET), Phenylalanine (PHE), Proline (PRO), Serine (SER)

  • Threonine (THR), Tryptophan (TRP), Tyrosine (TYR), Valine (VAL)

Type:

List[str]

molr.constants.pdb_constants.RNA_RESIDUES: List[str] = ['A', 'G', 'C', 'U', 'I']#

Standard single-letter codes for RNA nucleotide residues.

Contains the five RNA nucleotides commonly found in PDB structures:
  • A (Adenine): Purine base forming A-U base pairs

  • G (Guanine): Purine base forming G-C base pairs

  • C (Cytosine): Pyrimidine base forming C-G base pairs

  • U (Uracil): Pyrimidine base forming U-A base pairs

  • I (Inosine): Modified nucleotide, wobble base pairing

Used for nucleic acid chain identification and RNA structure analysis.

Type:

List[str]

molr.constants.pdb_constants.DNA_RESIDUES: List[str] = ['DA', 'DG', 'DC', 'DT', 'DI']#

Standard two-letter codes for DNA nucleotide residues.

Contains the five DNA nucleotides commonly found in PDB structures:
  • DA (Deoxyadenosine): Purine base forming A-T base pairs

  • DG (Deoxyguanosine): Purine base forming G-C base pairs

  • DC (Deoxycytidine): Pyrimidine base forming C-G base pairs

  • DT (Deoxythymidine): Pyrimidine base forming T-A base pairs

  • DI (Deoxyinosine): Modified nucleotide, wobble base pairing

Used for nucleic acid chain identification and DNA structure analysis. The ‘D’ prefix distinguishes DNA nucleotides from RNA nucleotides.

Type:

List[str]

molr.constants.pdb_constants.PDB_ATOM_TO_ELEMENT: Dict[str, str] = {'BR': 'BR', 'C': 'C', "C1'": 'C', 'C2': 'C', "C2'": 'C', "C3'": 'C', 'C4': 'C', "C4'": 'C', 'C5': 'C', "C5'": 'C', 'C5M': 'C', 'C6': 'C', 'C8': 'C', 'CA': 'C', 'CB': 'C', 'CD': 'C', 'CE': 'C', 'CG': 'C', 'CL': 'CL', 'CZ': 'C', 'D': 'D', 'F': 'F', 'H': 'H', 'HA': 'H', 'HB': 'H', 'HD': 'H', 'HE': 'H', 'HG': 'H', 'HH': 'H', 'HN': 'H', 'HO': 'H', 'HOH': 'H', 'HS': 'H', 'HZ': 'H', 'I': 'I', 'N': 'N', 'N1': 'N', 'N2': 'N', 'N3': 'N', 'N4': 'N', 'N6': 'N', 'N7': 'N', 'N9': 'N', 'ND1': 'N', 'ND2': 'N', 'NE': 'N', 'NE1': 'N', 'NE2': 'N', 'NH1': 'N', 'NH2': 'N', 'NZ': 'N', 'O': 'O', 'O2': 'O', "O2'": 'O', "O3'": 'O', 'O4': 'O', "O4'": 'O', "O5'": 'O', 'O6': 'O', 'OD1': 'O', 'OD2': 'O', 'OE1': 'O', 'OE2': 'O', 'OG': 'O', 'OG1': 'O', 'OH': 'O', 'OH2': 'O', 'OP1': 'O', 'OP2': 'O', 'P': 'P', 'SD': 'S', 'SG': 'S'}#

Pre-computed mapping of common PDB atom names to their element types.

This dictionary provides fast lookup for the most frequently encountered PDB atoms. For comprehensive coverage including unusual atoms, use pdb_atom_to_element() function which uses regex-based pattern matching.

Coverage includes:
  • Protein backbone and common side chain atoms

  • DNA/RNA backbone and nucleotide base atoms

  • Standard hydrogen atoms

  • Water molecules

For full pattern-based mapping that handles:
  • Greek letter remoteness indicators (CA, CB, CG, CD, CE, CZ, CH)

  • Numbered variants (C1’, H2’’, OP1, etc.)

  • Ion charges (CA2+, MG2+, etc.)

  • IUPAC hydrogen naming conventions

  • Uncommon PDB atom names

Use pdb_atom_to_element() function instead.

Used for:
  • Looking up atomic properties (radius, mass, electronegativity)

  • Covalent bond detection

  • Van der Waals calculations

  • Molecular mass calculations

Type:

Dict[str, str]

molr.constants.pdb_constants.PROTEIN_BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O']#

Standard protein backbone atom names in PDB format.

Defines the four atoms that form the protein backbone (main chain):
  • N: Amino nitrogen atom

  • CA: Alpha carbon atom (central carbon)

  • C: Carbonyl carbon atom

  • O: Carbonyl oxygen atom

These atoms are present in all amino acid residues (except proline’s modified N) and form the peptide bonds that connect residues.

Type:

List[str]

molr.constants.pdb_constants.DNA_RNA_BACKBONE_ATOMS: List[str] = ['P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#

Standard DNA/RNA backbone atom names in PDB format.

Sugar-phosphate backbone atoms:
  • P: Phosphorus atom

  • OP1, OP2: Non-bridging phosphate oxygens

  • O5’: 5’ phosphate oxygen (bridging)

  • C5’: 5’ carbon of ribose/deoxyribose

  • C4’: 4’ carbon of ribose/deoxyribose

  • O4’: 4’ oxygen of ribose/deoxyribose (ring oxygen)

  • C3’: 3’ carbon of ribose/deoxyribose

  • O3’: 3’ phosphate oxygen (bridging)

  • C2’: 2’ carbon of ribose/deoxyribose

  • O2’: 2’ hydroxyl oxygen (RNA only, absent in DNA)

  • C1’: 1’ carbon of ribose/deoxyribose (anomeric carbon)

Note: O2’ is present in RNA but absent in DNA (deoxyribose lacks 2’ hydroxyl).

Type:

List[str]

molr.constants.pdb_constants.BACKBONE_ATOMS: List[str] = ['N', 'CA', 'C', 'O', 'P', 'OP1', 'OP2', "O5'", "C5'", "C4'", "O4'", "C3'", "O3'", "C2'", "O2'", "C1'"]#

Combined backbone atom names for proteins, DNA, and RNA in PDB format.

This list is the combination of PROTEIN_BACKBONE_ATOMS and DNA_RNA_BACKBONE_ATOMS, providing a comprehensive set of backbone atoms for all major biomolecule types.

Used for:
  • Backbone hydrogen bond identification across all molecule types

  • Secondary structure analysis

  • Main chain vs side chain/base classification

  • Nucleic acid backbone conformation analysis

Type:

List[str]

molr.constants.pdb_constants.PROTEIN_SIDECHAIN_ATOMS: List[str] = ['CB', 'CG', 'CD', 'NE', 'CZ', 'NH1', 'NH2', 'OD1', 'ND2', 'OD2', 'SG', 'OE1', 'NE2', 'OE2', 'CD2', 'ND1', 'CE1', 'CG1', 'CG2', 'CD1', 'CE', 'NZ', 'SD', 'CE2', 'OG', 'OG1', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OH']#

Common protein side chain atom names in PDB format.

Comprehensive list of side chain (R-group) atoms found in the 20 standard amino acids:
  • Aliphatic carbons: CB, CG, CD, CE, CZ (branching from CA)

  • Aromatic carbons: CD1/CD2, CE1/CE2/CE3, CZ2/CZ3, CH2 (ring systems)

  • Nitrogen atoms: NE, NH1, NH2, ND1, ND2, NE1, NE2, NZ (basic groups)

  • Oxygen atoms: OD1, OD2, OE1, OE2, OG, OG1, OH (acidic/hydroxyl groups)

  • Sulfur atoms: SG, SD (cysteine, methionine)

Used for:
  • Side chain interaction analysis

  • Functional group identification

  • Hydrogen bond donor/acceptor classification

Type:

List[str]

molr.constants.pdb_constants.DNA_RNA_BASE_ATOMS: List[str] = ['N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#

Common DNA/RNA base atom names in PDB format.

Base atoms found in nucleotides: Purine bases (Adenine, Guanine):

  • N1, C2, N3, C4, C5, C6: Six-membered ring atoms

  • N7, C8, N9: Five-membered ring atoms

  • N6: Amino group on adenine

  • O6, N2: Functional groups on guanine

Pyrimidine bases (Cytosine, Thymine, Uracil):
  • N1, C2, N3, C4, C5, C6: Six-membered ring atoms

  • O2: Carbonyl oxygen at position 2

  • N4: Amino group on cytosine

  • O4: Carbonyl oxygen at position 4 (thymine/uracil)

  • C5M: Methyl group on thymine (also called C7)

Used for:
  • Base-base interactions (hydrogen bonding, stacking)

  • Protein-nucleic acid recognition

  • Base functional group identification

Type:

List[str]

molr.constants.pdb_constants.SIDECHAIN_ATOMS: List[str] = ['CB', 'CG', 'CD', 'NE', 'CZ', 'NH1', 'NH2', 'OD1', 'ND2', 'OD2', 'SG', 'OE1', 'NE2', 'OE2', 'CD2', 'ND1', 'CE1', 'CG1', 'CG2', 'CD1', 'CE', 'NZ', 'SD', 'CE2', 'OG', 'OG1', 'NE1', 'CE3', 'CZ2', 'CZ3', 'CH2', 'OH', 'N1', 'C2', 'N3', 'C4', 'C5', 'C6', 'N6', 'N7', 'C8', 'N9', 'O6', 'N2', 'O2', 'N4', 'O4', 'C5M']#

Combined side chain and base atoms for proteins and nucleic acids.

This list is the combination of PROTEIN_SIDECHAIN_ATOMS and DNA_RNA_BASE_ATOMS, providing a comprehensive set of non-backbone atoms for all major biomolecule types.

Used for:
  • Side chain/base interaction analysis

  • Distinguishing backbone from functional groups

  • Molecular recognition studies

Type:

List[str]

molr.constants.pdb_constants.WATER_MOLECULES: List[str] = ['HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#

Standard water molecule residue names in PDB files.

Recognition patterns for different water representations:
  • HOH: Standard PDB water molecule designation

  • WAT: Alternative water molecule name

  • DOD: Deuterated water (heavy water)

  • TIP3: TIP3P water model (3-point)

  • TIP4: TIP4P water model (4-point)

  • TIP5: TIP5P water model (5-point)

  • W: Abbreviated water designation

Used for:
  • Water molecule identification in PDB structures

  • Solvent exclusion during analysis

  • Water-mediated interaction detection

  • Hydration shell analysis

Type:

List[str]

molr.constants.pdb_constants.RESIDUES: List[str] = ['ALA', 'ASN', 'CYS', 'GLU', 'HIS', 'LEU', 'MET', 'PRO', 'THR', 'TYR', 'ARG', 'ASP', 'GLN', 'GLY', 'ILE', 'LYS', 'PHE', 'SER', 'TRP', 'VAL', 'DA', 'DG', 'DC', 'DT', 'DI', 'A', 'G', 'C', 'U', 'I', 'HOH', 'WAT', 'DOD', 'TIP3', 'TIP4', 'TIP5', 'W']#

Combined list of all standard residue codes for proteins, DNA, and RNA.

This list is the combination of PROTEIN_RESIDUES, DNA_RESIDUES, WATER_MOLECULES, and RNA_RESIDUES, providing a comprehensive set of standard residues found in biomolecular structures.

Used for:
  • General residue type validation

  • Distinguishing standard residues from heterogens

  • Biomolecule type identification

Type:

List[str]

molr.constants.pdb_constants.RESIDUES_WITH_AROMATIC_RINGS: List[str] = ['PHE', 'TYR', 'TRP', 'HIS', 'HID', 'HIE', 'HIP', 'TYI', 'TYQ', 'TYB', 'DA', 'DG', 'DC', 'DT', 'A', 'G', 'C', 'U']#

Residues containing aromatic rings in their structures. This list includes: Protein residues:

  • PHE: Phenylalanine (benzene ring)

  • TYR: Tyrosine (phenolic ring)

  • TRP: Tryptophan (indole ring)

  • HIS: Histidine (imidazole ring)

  • HID, HIE, HIP: Different protonation states of histidine

  • TYI, TYQ, TYB: Variants of tyrosine with modifications

DNA nucleotides:
  • DA: Deoxyadenosine (purine ring: adenine)

  • DG: Deoxyguanosine (purine ring: guanine)

  • DC: Deoxycytidine (pyrimidine ring: cytosine)

  • DT: Deoxythymidine (pyrimidine ring: thymine)

RNA nucleotides:
  • A: Adenine (purine ring)

  • G: Guanine (purine ring)

  • C: Cytosine (pyrimidine ring)

  • U: Uracil (pyrimidine ring)

Used for:
  • Aromatic interaction analysis

  • π-π stacking detection between proteins and nucleic acids

  • DNA/RNA-protein interface studies

Type:

List[str]

molr.constants.pdb_constants.HYDROGEN_ELEMENTS: List[str] = ['H', 'D']#

Hydrogen element types including isotopes.

Contains the hydrogen element symbols commonly found in PDB structures: - H: Standard hydrogen (protium) - D: Deuterium (heavy hydrogen isotope)

Used for:
  • Hydrogen bond donor/acceptor detection

  • Identifying hydrogen atoms in molecular interactions

  • Mass calculations and isotope effects

  • NMR-related structural analysis

Type:

List[str]

molr.constants.pdb_constants.HALOGEN_ELEMENTS: List[str] = ['F', 'CL', 'BR', 'I']#

Elements that can participate in halogen bonding as donors.

These halogens can act as electron acceptors in halogen bonds when covalently bonded to carbon (C-X…Y geometry). The halogen forms a σ-hole that can interact with electron-rich regions on acceptor atoms.

  • F: Fluorine (weakest halogen bond donor due to high electronegativity)

  • CL: Chlorine (common in drug design, moderate halogen bonding)

  • BR: Bromine (strong halogen bond donor, commonly studied)

  • I: Iodine (strongest halogen bond donor due to large, polarizable electron cloud)

Type:

List[str]

molr.constants.pdb_constants.HYDROGEN_BOND_DONOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F']#

Elements that can act as hydrogen bond donors.

These elements can form hydrogen bonds when covalently bonded to hydrogen atoms (D-H…A geometry). They are electronegative enough to polarize the D-H bond, creating a partial positive charge on the hydrogen that can interact with electron-rich acceptor atoms.

  • N: Nitrogen (amino groups, ring nitrogens, strong donors)

  • O: Oxygen (hydroxyl groups, moderate to strong donors)

  • S: Sulfur (thiol groups, weak donors due to lower electronegativity)

Type:

List[str]

molr.constants.pdb_constants.HYDROGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S', 'F', 'CL']#

Elements that can act as hydrogen bond acceptors.

These electronegative elements have lone pairs of electrons that can accept hydrogen bonds from donor atoms (D-H…A geometry). They can form favorable electrostatic interactions with the partial positive charge on hydrogen.

  • N: Nitrogen (lone pairs on amino groups, ring nitrogens)

  • O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups - strongest acceptors)

  • S: Sulfur (lone pairs on thiol, sulfide groups - weaker acceptors)

  • F: Fluorine (strongest electronegativity, excellent acceptor but rare in proteins)

  • CL: Chlorine (moderate acceptor, sometimes found in modified residues)

Type:

List[str]

molr.constants.pdb_constants.HALOGEN_BOND_ACCEPTOR_ELEMENTS: List[str] = ['N', 'O', 'S']#

Elements that can act as halogen bond acceptors.

These electronegative atoms can donate electron density to the σ-hole of halogen atoms in halogen bonds. They typically have lone pairs of electrons that can interact with the positive electrostatic potential of the halogen.

  • N: Nitrogen (lone pairs on amino groups, ring nitrogens)

  • O: Oxygen (lone pairs on carbonyl, hydroxyl, ether groups)

  • S: Sulfur (lone pairs on thiol, sulfide groups, weaker than N/O)

Type:

List[str]

molr.constants.pdb_constants.PI_INTERACTION_DONOR: List[str] = ['C']#

Elements that can act as π-interaction donors.

These atoms can participate in π-interactions when part of π-systems. Currently includes: - C: Carbon atoms

Type:

List[str]

molr.constants.pdb_constants.PI_INTERACTION_ATOMS: List[str] = ['H', 'F', 'CL']#

Elements that can participate in π-interactions.

Type:

List[str]

molr.constants.pdb_constants.RING_ATOMS_FOR_RESIDUES_WITH_AROMATIC_RINGS: Dict[str, List[str]] = {'A': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'C': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'DA': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'DC': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'DG': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'DT': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6'], 'G': ['N9', 'C8', 'N7', 'C5', 'C6', 'N1', 'C2', 'N3', 'C4'], 'HID': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIE': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIP': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'HIS': ['CG', 'ND1', 'CD2', 'CE1', 'NE2'], 'PHE': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TRP': ['CG', 'CD1', 'CD2', 'NE1', 'CE2', 'CE3', 'CZ2', 'CZ3', 'CH2'], 'TYB': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYI': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYQ': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'TYR': ['CG', 'CD1', 'CD2', 'CE1', 'CE2', 'CZ'], 'U': ['N1', 'C2', 'N3', 'C4', 'C5', 'C6']}#

Mapping of aromatic residues to their ring atom names.

This dictionary provides the specific atom names that form aromatic ring systems for each residue type containing aromatic groups:

Protein residues: Phenylalanine (PHE) and variants:

  • 6-membered benzene ring: CG-CD1-CE1-CZ-CE2-CD2

Tyrosine (TYR, TYI, TYQ, TYB) and variants:
  • 6-membered phenolic ring: CG-CD1-CE1-CZ-CE2-CD2

  • TYI: Ionized tyrosine (deprotonated hydroxyl)

  • TYQ: Quinone form of tyrosine

  • TYB: Brominated tyrosine

Tryptophan (TRP):
  • 5-membered pyrrole ring: CG-CD1-NE1-CE2-CD2

  • 6-membered benzene ring: CD2-CE2-CZ2-CH2-CZ3-CE3

  • Forms bicyclic indole system

Histidine (HIS, HID, HIE, HIP):
  • 5-membered imidazole ring: CG-ND1-CE1-NE2-CD2

  • HID: Delta protonated (H on ND1)

  • HIE: Epsilon protonated (H on NE2)

  • HIP: Both nitrogens protonated (positive charge)

DNA nucleotides: Adenine (DA) and Guanine (DG) - Purine bases:

  • 5-membered ring: N9-C8-N7-C5-C4

  • 6-membered ring: C5-C6-N1-C2-N3-C4

  • Forms bicyclic purine system

Cytosine (DC) and Thymine (DT) - Pyrimidine bases:
  • 6-membered ring: N1-C2-N3-C4-C5-C6

RNA nucleotides: Adenine (A) and Guanine (G) - Purine bases:

  • Same purine ring system as DNA counterparts

Cytosine (C) and Uracil (U) - Pyrimidine bases:
  • Same pyrimidine ring system as DNA counterparts

Used for:
  • Calculating aromatic ring centroids for π interactions

  • Identifying atoms involved in π-π stacking

  • Determining ring plane orientations

  • X-H…π interaction analysis where these atoms form the π system

  • DNA/RNA-protein interface interactions

  • Nucleotide base stacking analysis

Type:

Dict[str, List[str]]

molr.constants.pdb_constants.HYDROPHOBIC_RESIDUES: List[str] = ['VAL', 'LEU', 'ILE', 'MET', 'PHE', 'TRP', 'PRO', 'ALA']#

Hydrophobic amino acid residues with nonpolar side chains.

These amino acids have side chains that are predominantly nonpolar and hydrophobic:
  • VAL (Valine): Branched aliphatic chain

  • LEU (Leucine): Branched aliphatic chain

  • ILE (Isoleucine): Branched aliphatic chain

  • MET (Methionine): Sulfur-containing nonpolar chain

  • PHE (Phenylalanine): Aromatic benzyl group

  • TRP (Tryptophan): Aromatic indole group

  • PRO (Proline): Cyclic imino acid structure

  • ALA (Alanine): Simple methyl group

Used for:
  • Hydrophobic interaction analysis

  • Protein folding studies

  • Membrane protein analysis

  • Hydrophobic patch identification

Type:

List[str]

molr.constants.pdb_constants.CHARGED_RESIDUES: List[str] = ['ARG', 'LYS', 'ASP', 'GLU', 'HIS']#

Charged amino acid residues with ionizable side chains.

These amino acids carry formal charges at physiological pH:
  • ARG (Arginine): Positively charged guanidinium group (+1)

  • LYS (Lysine): Positively charged amino group (+1)

  • ASP (Aspartic acid): Negatively charged carboxylate group (-1)

  • GLU (Glutamic acid): Negatively charged carboxylate group (-1)

  • HIS (Histidine): Can be positively charged imidazolium group (pKa ~6)

Used for:
  • Electrostatic interaction analysis

  • Salt bridge identification

  • pH-dependent behavior studies

  • Ion binding site analysis

Type:

List[str]

molr.constants.pdb_constants.RESIDUE_TYPES: List[str] = ['DNA', 'RNA', 'PROTEIN', 'LIGAND']#

Standard residue type classifications for molecular analysis.

Classification categories for different types of molecular residues:
  • DNA: Deoxyribonucleotide residues (DA, DG, DC, DT, DI)

  • RNA: Ribonucleotide residues (A, G, C, U, I)

  • PROTEIN: Amino acid residues (20 standard amino acids and variants)

  • LIGAND: Ligands, cofactors, metals, and other heteroatom residues

Used for:
  • Residue type identification and classification

  • Molecular component analysis

  • Structure validation and processing

  • Interaction type determination

Type:

List[str]

molr.constants.pdb_constants.RESIDUE_TYPE_CODES: Dict[str, str] = {'DNA': 'D', 'LIGAND': 'L', 'PROTEIN': 'P', 'RNA': 'R'}#

Single letter codes for residue types.

Mapping of full residue type names to compact single letter codes:
  • “DNA” → “D”: Deoxyribonucleotide residues

  • “RNA” → “R”: Ribonucleotide residues

  • “PROTEIN” → “P”: Amino acid residues

  • “LIGAND” → “L”: Ligands, cofactors, metals, and other heteroatom residues

Used for compact representation in hydrogen bond descriptions and atom records.

Type:

Dict[str, str]

molr.constants.pdb_constants.BACKBONE_SIDECHAIN_CODES: Dict[str, str] = {'BACKBONE': 'B', 'NOT_APPLICABLE': 'N', 'SIDECHAIN': 'S'}#

Single letter codes for backbone vs sidechain classification.

Mapping of atom structural classification to compact single letter codes:
  • “BACKBONE” → “B”: Main chain atoms (protein backbone, DNA/RNA sugar-phosphate)

  • “SIDECHAIN” → “S”: Side chain atoms (protein R-groups, nucleotide bases)

Used for describing hydrogen bond donor-acceptor relationships (e.g., S-S, S-B, B-B).

Type:

Dict[str, str]

molr.constants.pdb_constants.AROMATIC_CODES: Dict[str, str] = {'AROMATIC': 'A', 'NON-AROMATIC': 'N'}#

Single letter codes for aromatic classification.

Mapping of aromatic property classification to compact single letter codes:
  • “AROMATIC” → “A”: Atoms that are part of aromatic ring systems

  • “NON-AROMATIC” → “N”: Atoms that are not part of aromatic ring systems

Used for identifying atoms involved in π-interactions and aromatic stacking.

Type:

Dict[str, str]

PDB format constants and mappings.

Available Data:

  • Record type definitions

  • Standard residue names

  • Chain identifier mappings

Residue Bond Templates#

Residue Bond Information Constants

This module contains bond connectivity information for standard residues extracted from the Chemical Component Dictionary (CCD). This data is used for molecular structure validation and bond detection in molr.

Generated automatically from CCD BinaryCIF files.

molr.constants.residue_bonds.get_residue_bonds(residue)[source]#

Get bond information for a specific residue.

Parameters:

residue (str) – Three-letter residue code (e.g., “ALA”, “GLY”)

Return type:

List[Dict[str, Union[str, bool]]]

Returns:

List of bond dictionaries with atom1, atom2, order, and aromatic info

molr.constants.residue_bonds.get_residue_bond_count(residue)[source]#

Get the total number of bonds for a residue.

Parameters:

residue (str) – Three-letter residue code

Return type:

int

Returns:

Number of bonds in the residue

molr.constants.residue_bonds.has_aromatic_bonds(residue)[source]#

Check if a residue has aromatic bonds.

Parameters:

residue (str) – Three-letter residue code

Return type:

bool

Returns:

True if the residue has aromatic bonds, False otherwise

Standard topology templates for common residues.

Available Templates:

  • Amino acid topologies

  • Nucleotide topologies

  • Common ligand templates

  • Metal coordination patterns

Configuration#

Configuration for MolR package.

This module provides configuration paths and settings for the MolR package, including paths for CCD data storage.

molr.config.get_molr_data_dir()[source]#

Get the MolR data directory path.

Return type:

Path

Returns:

Path to MolR data directory (~/.molr)

molr.config.get_ccd_data_path()[source]#

Get the path for CCD data storage.

Return type:

Path

Returns:

Path to CCD data directory (~/.molr/ccd-data)

Global configuration settings for MolR.

Configuration Options:

  • Default bond detection parameters

  • Spatial indexing settings

  • I/O parser options

  • Selection language settings

Type Hints#

MolR provides complete type hint coverage. Key type aliases:

from typing import Union, List, Tuple, Optional
import numpy as np

# Common type aliases used throughout MolR
AtomIndex = int
AtomMask = np.ndarray  # Boolean array for atom selection
Coordinates = np.ndarray  # Shape (n_atoms, 3)
BondPair = Tuple[AtomIndex, AtomIndex]
SelectionString = str

Usage Examples#

Here are some common usage patterns for the API:

import molr
import numpy as np

# Load and analyze structure
structure = molr.Structure.from_pdb("protein.pdb")
bonds = structure.detect_bonds()

# Selection operations
backbone = structure.select("backbone")
active_site = structure.select("within 5.0 of (resname LIG)")

# Spatial queries
neighbors = structure.get_neighbors_within(100, 5.0)
sphere_atoms = structure.get_atoms_within_sphere([0, 0, 0], 10.0)

# Bond analysis
atom_neighbors = bonds.get_neighbors(100)
connectivity = bonds.to_connectivity_matrix(structure.n_atoms)

# Custom bond detection
from molr.bond_detection import DefaultBondDetector
detector = DefaultBondDetector()
custom_bonds = detector.detect_bonds(structure)

For more detailed examples, see the Examples section.