Skip to content

Regression & ThermoML

Differentiable parameter estimation (Levenberg-Marquardt and gradient descent), UNIFAC-to-binary prediction, the bundled ThermoML parameter bank, and the NIST ThermoML archive reader.

Regression

regression

Differentiable parameter estimation for activity-coefficient models.

Because every equilibrium output is differentiable with respect to the activity-model parameters (see fugacio.thermo.gammaphi, fugacio.thermo.lle), fitting a model to data is plain gradient-based optimisation: no finite-difference parameter sweeps, no black-box derivatives. This module supplies:

  • two self-contained optimisers, levenberg_marquardt for nonlinear least squares (exact Gauss-Newton Hessian, adaptive damping) and a simple gradient_descent, that operate on an arbitrary parameter pytree;
  • residual builders that turn experimental data into a residual vector: bubble_pressure_residuals (isothermal/isobaric P-x-y VLE), activity_residuals (measured ln gamma), and lle_residuals (mutual-solubility / tie-line data); and
  • convenience fitters (fit_nrtl_binary, fit_uniquac_binary) that wire a model factory to the optimiser and return a ready model object.

A "model factory" is any theta -> ActivityModel mapping; the optimiser fits theta (the differentiable leaves you choose to expose), so you control which parameters are free and which are fixed.

Functions:

Name Description
levenberg_marquardt

Minimise 0.5 * sum(residual(theta)**2) by Levenberg-Marquardt.

gradient_descent

Minimise a scalar objective(theta) by fixed-step gradient descent.

bubble_pressure_residuals

Residuals of predicted vs. measured bubble pressure (and optionally vapour).

activity_residuals

Residuals of predicted vs. measured log activity coefficients.

lle_residuals

Isoactivity residuals at measured liquid-liquid tie-line ends.

fit_nrtl_binary

Fit binary NRTL b parameters (fixed alpha) to bubble-point data.

unifac_ln_gamma_grid

Sample (modified) UNIFAC ln gamma on a binary composition/temperature grid.

predict_nrtl_from_unifac

Predict binary NRTL b parameters by fitting to UNIFAC activity coefficients.

predict_uniquac_from_unifac

Predict binary UNIQUAC b parameters by fitting to UNIFAC activity coefficients.

fit_uniquac_binary

Fit binary UNIQUAC b parameters (with given r, q) to bubble-point data.

levenberg_marquardt

levenberg_marquardt(
    residual: ResidualFn,
    theta0: Any,
    *,
    max_iter: int = 100,
    lambda0: float = 0.01,
    factor: float = 5.0,
    tol: float = 1e-12,
) -> tuple[Any, Array]

Minimise 0.5 * sum(residual(theta)**2) by Levenberg-Marquardt.

A trust-region blend of Gauss-Newton and gradient descent: each step solves (J^T J + lambda diag(J^T J)) delta = -J^T r with the exact Jacobian J (via jax.jacobian), shrinking lambda after an accepted step and growing it after a rejected one. Operates on any parameter pytree theta0.

Returns:

Type Description
Any

(theta, cost): the fitted parameter pytree and the final

Array

half-sum-of-squares cost.

gradient_descent

gradient_descent(
    objective: Callable[[Any], Array],
    theta0: Any,
    *,
    learning_rate: float = 0.01,
    max_iter: int = 500,
) -> tuple[Any, Array]

Minimise a scalar objective(theta) by fixed-step gradient descent.

A dependency-free fallback for objectives that are not least-squares; returns (theta, objective(theta)).

bubble_pressure_residuals

bubble_pressure_residuals(
    make_model: ModelFactory,
    t: Array,
    x: Array,
    p_exp: Array,
    tc: Array,
    pc: Array,
    omega: Array,
    *,
    y_exp: Array | None = None,
    p_scale: ArrayLike | None = None,
    y_weight: float = 1.0,
    **opts: Any,
) -> ResidualFn

Residuals of predicted vs. measured bubble pressure (and optionally vapour).

Parameters:

Name Type Description Default
make_model ModelFactory

theta -> ActivityModel factory.

required
t Array

Temperatures (K), shape (m,).

required
x Array

Liquid compositions, shape (m, n).

required
p_exp Array

Measured bubble pressures (Pa), shape (m,).

required
tc Array

Component critical temperatures (K).

required
pc Array

Component critical pressures (Pa).

required
omega Array

Component acentric factors.

required
y_exp Array | None

Optional measured vapour compositions, shape (m, n).

None
p_scale ArrayLike | None

Pressure normaliser (defaults to mean(p_exp)).

None
y_weight float

Relative weight on the vapour-composition residuals.

1.0
**opts Any

Forwarded to bubble_pressure_gamma (vapor, poynting, ...).

{}

Returns:

Type Description
ResidualFn

residual(theta) -> 1-D array for use with levenberg_marquardt.

activity_residuals

activity_residuals(
    make_model: ModelFactory,
    t: Array,
    x: Array,
    ln_gamma_exp: Array,
) -> ResidualFn

Residuals of predicted vs. measured log activity coefficients.

t is shape (m,), x and ln_gamma_exp are shape (m, n).

lle_residuals

lle_residuals(
    make_model: ModelFactory,
    t: Array,
    x_i_exp: Array,
    x_ii_exp: Array,
) -> ResidualFn

Isoactivity residuals at measured liquid-liquid tie-line ends.

A consistent model makes each experimental conjugate pair iso-active: x_i^I gamma_i^I = x_i^II gamma_i^II. t is (m,); the compositions are (m, n).

fit_nrtl_binary

fit_nrtl_binary(
    t: Array,
    x: Array,
    p_exp: Array,
    tc: Array,
    pc: Array,
    omega: Array,
    *,
    alpha: float = 0.3,
    y_exp: Array | None = None,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 80,
    **opts: Any,
) -> tuple[NRTL, Array]

Fit binary NRTL b parameters (fixed alpha) to bubble-point data.

The free parameters are the two 1/T interaction coefficients b12, b21 (Kelvin); a = 0 and the non-randomness alpha are held fixed. Returns (fitted NRTL, final cost).

unifac_ln_gamma_grid

unifac_ln_gamma_grid(
    components: list[str],
    t: ArrayLike,
    *,
    points: int = 11,
    dortmund: bool = False,
    x_min: float = 0.02,
) -> tuple[Array, Array, Array]

Sample (modified) UNIFAC ln gamma on a binary composition/temperature grid.

This turns a predictive group-contribution model into pseudo-data for fitting a correlative NRTL/UNIQUAC model, the standard way to obtain binary interaction parameters for a pair that has no measured VLE.

Parameters:

Name Type Description Default
components list[str]

Exactly two component names with UNIFAC group assignments.

required
t ArrayLike

Temperature(s) (K); a scalar or 1-D array. The grid is the outer product of the temperatures with the composition samples.

required
points int

Number of liquid compositions sampled in (x_min, 1 - x_min).

11
dortmund bool

Use modified UNIFAC (Dortmund) instead of classic UNIFAC.

False
x_min float

Smallest mole fraction sampled (kept away from the pure limits).

0.02

Returns:

Type Description
Array

(t_grid, x_grid, ln_gamma_grid) with shapes (m,), (m, 2),

Array

(m, 2) where m = points * n_temperatures.

Raises:

Type Description
ValueError

if components is not a binary pair.

predict_nrtl_from_unifac

predict_nrtl_from_unifac(
    components: list[str],
    t: ArrayLike,
    *,
    alpha: float = 0.3,
    dortmund: bool = False,
    points: int = 11,
    x_min: float = 0.02,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 120,
) -> tuple[NRTL, Array]

Predict binary NRTL b parameters by fitting to UNIFAC activity coefficients.

UNIFAC supplies ln gamma over a composition (and temperature) grid; the two NRTL 1/T coefficients b12, b21 (with a = 0 and fixed alpha) are fitted to it by levenberg_marquardt. Use it to bootstrap a correlative model for a pair without measured data.

Returns:

Type Description
tuple[NRTL, Array]

(fitted NRTL, final cost).

predict_uniquac_from_unifac

predict_uniquac_from_unifac(
    components: list[str],
    t: ArrayLike,
    *,
    r: Array | None = None,
    q: Array | None = None,
    dortmund: bool = False,
    points: int = 11,
    x_min: float = 0.02,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 120,
) -> tuple[UNIQUAC, Array]

Predict binary UNIQUAC b parameters by fitting to UNIFAC activity coefficients.

Like predict_nrtl_from_unifac, but for UNIQUAC. The surface/volume parameters r, q default to the curated values (fugacio.thermo.data.uniquac_rq); the free parameters are the 1/T coefficients of tau = exp(b/T).

Returns:

Type Description
tuple[UNIQUAC, Array]

(fitted UNIQUAC, final cost).

fit_uniquac_binary

fit_uniquac_binary(
    t: Array,
    x: Array,
    p_exp: Array,
    tc: Array,
    pc: Array,
    omega: Array,
    r: Array,
    q: Array,
    *,
    y_exp: Array | None = None,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 80,
    **opts: Any,
) -> tuple[UNIQUAC, Array]

Fit binary UNIQUAC b parameters (with given r, q) to bubble-point data.

Free parameters are the 1/T coefficients of ln tau = a + b/T with a = 0. Returns (fitted UNIQUAC, final cost).

Parameter bank

parameter_bank

Batch regression of ThermoML datasets into a reusable binary-parameter bank.

This is the bridge between the ThermoML reader (fugacio.thermo.thermoml) and the differentiable fitters (fugacio.thermo.regression): point it at parsed documents (or the bundled samples) and it fits a binary activity model to every isothermal P-x VLE table it can understand, recording the fitted parameters together with their provenance: source dataset, temperature range, point count, and the scaled root-mean-square pressure residual. The collected FittedBinary records form a ParameterBank that hands back ready-to-use NRTL models with the component ordering you ask for.

A bank fitted to the bundled samples ships with the package (parameter_bank.json, regenerated by scripts/gen_parameter_bank.py); load it with ParameterBank.load_bundled. Components are matched to the Fugacio database by CAS number, never by name, so document-local naming quirks ("water (H2O)") cannot mis-assign a dataset.

Classes:

Name Description
FittedBinary

A fitted binary interaction record with its provenance.

ParameterBank

A lookup table of fitted binary parameters, keyed by component pair.

Functions:

Name Description
fit_vle_dataset

Fit binary NRTL b parameters to one parsed P-x VLE dataset.

fit_bundled_samples

Fit every binary P-x VLE dataset among the bundled ThermoML samples.

FittedBinary dataclass

FittedBinary(
    components: tuple[str, str],
    model: str,
    alpha: float,
    b12: float,
    b21: float,
    t_min: float,
    t_max: float,
    n_points: int,
    rmse: float,
    source: str,
)

A fitted binary interaction record with its provenance.

Attributes:

Name Type Description
components tuple[str, str]

Database names (1, 2) in the orientation of b12/b21.

model str

Activity-model family (currently always "nrtl").

alpha float

The fixed NRTL non-randomness parameter used in the fit.

b12, b21

Fitted tau = b/T interaction coefficients (K).

t_min, t_max

Temperature range of the underlying data (K).

n_points int

Number of data rows fitted.

rmse float

Root-mean-square residual of (P_pred - P_exp) / mean(P_exp).

source str

Provenance string (sample name or document citation).

Methods:

Name Description
nrtl

The fitted model as a ready-to-evaluate two-component NRTL.

nrtl

nrtl() -> NRTL

The fitted model as a ready-to-evaluate two-component NRTL.

ParameterBank

ParameterBank(entries: list[FittedBinary])

A lookup table of fitted binary parameters, keyed by component pair.

Pairs are stored orientation-free: get and nrtl accept the components in either order and return parameters oriented as requested (swapping b12/b21 when needed).

Methods:

Name Description
get

The record for a pair, reoriented so component_1 is component 1.

nrtl

A ready NRTL model for the pair, in the requested component order.

to_json

Serialize the bank (sorted, human-diffable).

from_json

Inverse of to_json.

load_bundled

The bank fitted to the bundled ThermoML samples (ships with the package).

Attributes:

Name Type Description
entries list[FittedBinary]

All records, sorted by component pair for stable iteration.

entries property

entries: list[FittedBinary]

All records, sorted by component pair for stable iteration.

get

get(
    component_1: str, component_2: str
) -> FittedBinary | None

The record for a pair, reoriented so component_1 is component 1.

nrtl

nrtl(component_1: str, component_2: str) -> NRTL

A ready NRTL model for the pair, in the requested component order.

Raises:

Type Description
KeyError

if the bank has no record for the pair.

to_json

to_json() -> str

Serialize the bank (sorted, human-diffable).

from_json classmethod

from_json(text: str) -> ParameterBank

Inverse of to_json.

load_bundled classmethod

load_bundled() -> ParameterBank

The bank fitted to the bundled ThermoML samples (ships with the package).

fit_vle_dataset

fit_vle_dataset(
    data: ThermoMLData,
    dataset: Dataset,
    *,
    alpha: float = 0.3,
    source: str = "",
    max_iter: int = 80,
) -> FittedBinary

Fit binary NRTL b parameters to one parsed P-x VLE dataset.

The dataset must be binary with a temperature column, a composition column for its first component, and a pressure column (the layout of every bundled VLE sample and of archive isothermal P-x tables).

Returns:

Type Description
FittedBinary

The FittedBinary record, oriented so component 1 is the

FittedBinary

dataset's first component.

Raises:

Type Description
ValueError

if the dataset is not a binary P-x table.

KeyError

if a compound cannot be matched to the component database.

fit_bundled_samples

fit_bundled_samples(
    names: list[str] | None = None, *, alpha: float = 0.3
) -> list[FittedBinary]

Fit every binary P-x VLE dataset among the bundled ThermoML samples.

Non-VLE samples (pure-component tables, missing columns) are skipped, so the driver can be pointed at the whole sample directory. This is the batch regression behind the bundled parameter bank.

ThermoML reader

thermoml

Reader for the NIST ThermoML archive XML format.

ThermoML <https://www.nist.gov/mml/acmd/trc/thermoml>_ is the IUPAC/NIST XML standard for thermophysical and thermochemical property data; the freely redistributable ThermoML Archive <https://www.nist.gov/mml/acmd/trc/thermoml/thermoml-archive>_ holds tens of thousands of experimental datasets. This module turns those files into tidy, typed tables you can feed straight into fugacio.thermo.regression, so a model can be fitted to real measurements, and predictions graded against them.

The parser is deliberately tolerant and dependency-free (standard-library xml.etree.ElementTree only):

  • XML namespaces are stripped, so files declaring the ThermoML namespace (or none) parse identically;
  • compounds, mixtures, variables, properties, and the numeric value rows are read by their local element names, matching the published schema without binding to a specific version;
  • each Dataset exposes its columns as aligned numeric rows plus convenience accessors (Dataset.temperature, Dataset.pressure, Dataset.mole_fraction) with pressure unit conversion to pascal.

A couple of small, schema-faithful datasets ship with the package for tests and examples; see list_samples / load_sample.

Classes:

Name Description
Compound

A chemical compound declared in a ThermoML document.

Column

One variable or property column of a Dataset table.

Dataset

A PureOrMixtureData block: a table of measurements for one mixture.

ThermoMLData

A parsed ThermoML document: its compounds, datasets, and citation.

Functions:

Name Description
loads

Parse a ThermoML document from an in-memory string or bytes.

read_thermoml

Parse a ThermoML document from a path or open file object.

list_samples

Names (without extension) of the bundled ThermoML sample datasets.

sample_path

Filesystem path of a bundled sample (with or without the .xml suffix).

load_sample

Parse a bundled ThermoML sample by name (see list_samples).

Compound dataclass

Compound(
    org_num: int,
    name: str | None = None,
    formula: str | None = None,
    cas: str | None = None,
    inchikey: str | None = None,
)

A chemical compound declared in a ThermoML document.

Attributes:

Name Type Description
org_num int

The document-local organization number used to reference this compound from mixtures and composition variables.

name str | None

Common name, if given.

formula str | None

Molecular formula, if given.

cas str | None

CAS registry number, if present.

inchikey str | None

Standard InChIKey, if present.

Column dataclass

Column(
    number: int,
    role: str,
    kind: str,
    label: str,
    component: int | None = None,
)

One variable or property column of a Dataset table.

Attributes:

Name Type Description
number int

The nVarNumber / nPropNumber within the dataset.

role str

"variable" (an independent, controlled quantity) or "property" (a measured quantity).

kind str

The ThermoML type element local name, e.g. "eTemperature", "ePressure", "eComponentComposition".

label str

Human-readable label including units, e.g. "Pressure, kPa".

component int | None

For composition columns, the org_num of the component the fraction refers to; otherwise None.

quantity property

quantity: str

The label with any trailing unit removed.

unit property

unit: str | None

The unit parsed from the label, if any.

Dataset dataclass

Dataset(
    components: tuple[int, ...],
    columns: tuple[Column, ...],
    rows: tuple[tuple[float, ...], ...],
    phase: str | None = None,
    number: int | None = None,
)

A PureOrMixtureData block: a table of measurements for one mixture.

The rows are aligned with columns; a missing cell is float('nan').

Attributes:

Name Type Description
components tuple[int, ...]

org_num of each component participating, in document order.

columns tuple[Column, ...]

The variable and property columns, in document order.

rows tuple[tuple[float, ...], ...]

Numeric rows aligned with columns.

phase str | None

The reported phase string, if any (e.g. "Liquid").

number int | None

The nPureOrMixtureDataNumber identifier, if present.

Methods:

Name Description
values

All values of one column, in row order.

find_column

First column matching the given filters (any combination).

temperature

Temperatures in kelvin (raises if the dataset has no temperature column).

pressure

Pressures converted to unit (default pascal).

mole_fraction

Mole fractions of component (by org_num).

to_dict

The table as {label: [values...]} (duplicate labels get a suffix).

labels property

labels: tuple[str, ...]

Column labels, in column order.

values

values(col: Column) -> tuple[float, ...]

All values of one column, in row order.

find_column

find_column(
    *,
    kind: str | None = None,
    quantity: str | None = None,
    component: int | None = None,
) -> Column | None

First column matching the given filters (any combination).

temperature

temperature() -> tuple[float, ...]

Temperatures in kelvin (raises if the dataset has no temperature column).

pressure

pressure(*, unit: str = 'Pa') -> tuple[float, ...]

Pressures converted to unit (default pascal).

Accepts pressure stored either as a controlled variable (ePressure) or as a measured property (e.g. a vapour-pressure column).

mole_fraction

mole_fraction(component: int) -> tuple[float, ...]

Mole fractions of component (by org_num).

to_dict

to_dict() -> dict[str, list[float]]

The table as {label: [values...]} (duplicate labels get a suffix).

ThermoMLData dataclass

ThermoMLData(
    compounds: tuple[Compound, ...],
    datasets: tuple[Dataset, ...],
    citation: str | None = None,
)

A parsed ThermoML document: its compounds, datasets, and citation.

Methods:

Name Description
compound

The compound with the given org_num (raises KeyError if absent).

component_names

Best-effort names of a dataset's components (falls back to C{org_num}).

compound

compound(org_num: int) -> Compound

The compound with the given org_num (raises KeyError if absent).

component_names

component_names(dataset: Dataset) -> list[str]

Best-effort names of a dataset's components (falls back to C{org_num}).

loads

loads(text: str | bytes) -> ThermoMLData

Parse a ThermoML document from an in-memory string or bytes.

read_thermoml

read_thermoml(
    source: str | Path | IO[bytes] | IO[str],
) -> ThermoMLData

Parse a ThermoML document from a path or open file object.

Parameters:

Name Type Description Default
source str | Path | IO[bytes] | IO[str]

A filesystem path (str/Path) or a readable file object containing ThermoML XML.

required

Returns:

Type Description
ThermoMLData

The parsed ThermoMLData.

list_samples

list_samples() -> list[str]

Names (without extension) of the bundled ThermoML sample datasets.

sample_path

sample_path(name: str) -> Path

Filesystem path of a bundled sample (with or without the .xml suffix).

load_sample

load_sample(name: str) -> ThermoMLData

Parse a bundled ThermoML sample by name (see list_samples).