Regression & ThermoML¶

Differentiable parameter estimation (Levenberg-Marquardt and gradient descent), UNIFAC-to-binary prediction, the bundled ThermoML parameter bank, and the NIST ThermoML archive reader.

Regression¶

regression ¶

Differentiable parameter estimation for activity-coefficient models.

Because every equilibrium output is differentiable with respect to the activity-model parameters (see fugacio.thermo.gammaphi, fugacio.thermo.lle), fitting a model to data is plain gradient-based optimisation: no finite-difference parameter sweeps, no black-box derivatives. This module supplies:

two self-contained optimisers, levenberg_marquardt for nonlinear least squares (exact Gauss-Newton Hessian, adaptive damping) and a simple gradient_descent, that operate on an arbitrary parameter pytree;
residual builders that turn experimental data into a residual vector: bubble_pressure_residuals (isothermal/isobaric P-x-y VLE), activity_residuals (measured ln gamma), and lle_residuals (mutual-solubility / tie-line data); and
convenience fitters (fit_nrtl_binary, fit_uniquac_binary) that wire a model factory to the optimiser and return a ready model object.

A "model factory" is any theta -> ActivityModel mapping; the optimiser fits theta (the differentiable leaves you choose to expose), so you control which parameters are free and which are fixed.

Functions:

Name	Description
`levenberg_marquardt`	Minimise `0.5 * sum(residual(theta)**2)` by Levenberg-Marquardt.
`gradient_descent`	Minimise a scalar `objective(theta)` by fixed-step gradient descent.
`bubble_pressure_residuals`	Residuals of predicted vs. measured bubble pressure (and optionally vapour).
`activity_residuals`	Residuals of predicted vs. measured log activity coefficients.
`lle_residuals`	Isoactivity residuals at measured liquid-liquid tie-line ends.
`fit_nrtl_binary`	Fit binary NRTL `b` parameters (fixed `alpha`) to bubble-point data.
`unifac_ln_gamma_grid`	Sample (modified) UNIFAC `ln gamma` on a binary composition/temperature grid.
`predict_nrtl_from_unifac`	Predict binary NRTL `b` parameters by fitting to UNIFAC activity coefficients.
`predict_uniquac_from_unifac`	Predict binary UNIQUAC `b` parameters by fitting to UNIFAC activity coefficients.
`fit_uniquac_binary`	Fit binary UNIQUAC `b` parameters (with given `r`, `q`) to bubble-point data.

levenberg_marquardt ¶

levenberg_marquardt(
    residual: ResidualFn,
    theta0: Any,
    *,
    max_iter: int = 100,
    lambda0: float = 0.01,
    factor: float = 5.0,
    tol: float = 1e-12,
) -> tuple[Any, Array]

Minimise 0.5 * sum(residual(theta)**2) by Levenberg-Marquardt.

A trust-region blend of Gauss-Newton and gradient descent: each step solves (J^T J + lambda diag(J^T J)) delta = -J^T r with the exact Jacobian J (via jax.jacobian), shrinking lambda after an accepted step and growing it after a rejected one. Operates on any parameter pytree theta0.

Returns:

Type	Description
`Any`	`(theta, cost)`: the fitted parameter pytree and the final
`Array`	half-sum-of-squares cost.

gradient_descent ¶

gradient_descent(
    objective: Callable[[Any], Array],
    theta0: Any,
    *,
    learning_rate: float = 0.01,
    max_iter: int = 500,
) -> tuple[Any, Array]

Minimise a scalar objective(theta) by fixed-step gradient descent.

A dependency-free fallback for objectives that are not least-squares; returns (theta, objective(theta)).

bubble_pressure_residuals ¶

bubble_pressure_residuals(
    make_model: ModelFactory,
    t: Array,
    x: Array,
    p_exp: Array,
    tc: Array,
    pc: Array,
    omega: Array,
    *,
    y_exp: Array | None = None,
    p_scale: ArrayLike | None = None,
    y_weight: float = 1.0,
    **opts: Any,
) -> ResidualFn

Residuals of predicted vs. measured bubble pressure (and optionally vapour).

Parameters:

Name	Type	Description	Default
`make_model`	`ModelFactory`	`theta -> ActivityModel` factory.	required
`t`	`Array`	Temperatures (K), shape `(m,)`.	required
`x`	`Array`	Liquid compositions, shape `(m, n)`.	required
`p_exp`	`Array`	Measured bubble pressures (Pa), shape `(m,)`.	required
`tc`	`Array`	Component critical temperatures (K).	required
`pc`	`Array`	Component critical pressures (Pa).	required
`omega`	`Array`	Component acentric factors.	required
`y_exp`	`Array \| None`	Optional measured vapour compositions, shape `(m, n)`.	`None`
`p_scale`	`ArrayLike \| None`	Pressure normaliser (defaults to `mean(p_exp)`).	`None`
`y_weight`	`float`	Relative weight on the vapour-composition residuals.	`1.0`
`**opts`	`Any`	Forwarded to `bubble_pressure_gamma` (`vapor`, `poynting`, ...).	`{}`

Returns:

Type	Description
`ResidualFn`	`residual(theta) -> 1-D array` for use with `levenberg_marquardt`.

activity_residuals ¶

activity_residuals(
    make_model: ModelFactory,
    t: Array,
    x: Array,
    ln_gamma_exp: Array,
) -> ResidualFn

Residuals of predicted vs. measured log activity coefficients.

t is shape (m,), x and ln_gamma_exp are shape (m, n).

lle_residuals ¶

lle_residuals(
    make_model: ModelFactory,
    t: Array,
    x_i_exp: Array,
    x_ii_exp: Array,
) -> ResidualFn

Isoactivity residuals at measured liquid-liquid tie-line ends.

A consistent model makes each experimental conjugate pair iso-active: x_i^I gamma_i^I = x_i^II gamma_i^II. t is (m,); the compositions are (m, n).

fit_nrtl_binary ¶

fit_nrtl_binary(
    t: Array,
    x: Array,
    p_exp: Array,
    tc: Array,
    pc: Array,
    omega: Array,
    *,
    alpha: float = 0.3,
    y_exp: Array | None = None,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 80,
    **opts: Any,
) -> tuple[NRTL, Array]

Fit binary NRTL b parameters (fixed alpha) to bubble-point data.

The free parameters are the two 1/T interaction coefficients b12, b21 (Kelvin); a = 0 and the non-randomness alpha are held fixed. Returns (fitted NRTL, final cost).

unifac_ln_gamma_grid ¶

unifac_ln_gamma_grid(
    components: list[str],
    t: ArrayLike,
    *,
    points: int = 11,
    dortmund: bool = False,
    x_min: float = 0.02,
) -> tuple[Array, Array, Array]

Sample (modified) UNIFAC ln gamma on a binary composition/temperature grid.

This turns a predictive group-contribution model into pseudo-data for fitting a correlative NRTL/UNIQUAC model, the standard way to obtain binary interaction parameters for a pair that has no measured VLE.

Parameters:

Name	Type	Description	Default
`components`	`list[str]`	Exactly two component names with UNIFAC group assignments.	required
`t`	`ArrayLike`	Temperature(s) (K); a scalar or 1-D array. The grid is the outer product of the temperatures with the composition samples.	required
`points`	`int`	Number of liquid compositions sampled in `(x_min, 1 - x_min)`.	`11`
`dortmund`	`bool`	Use modified UNIFAC (Dortmund) instead of classic UNIFAC.	`False`
`x_min`	`float`	Smallest mole fraction sampled (kept away from the pure limits).	`0.02`

Returns:

Type	Description
`Array`	`(t_grid, x_grid, ln_gamma_grid)` with shapes `(m,)`, `(m, 2)`,
`Array`	`(m, 2)` where `m = points * n_temperatures`.

Raises:

Type	Description
`ValueError`	if `components` is not a binary pair.

predict_nrtl_from_unifac ¶

predict_nrtl_from_unifac(
    components: list[str],
    t: ArrayLike,
    *,
    alpha: float = 0.3,
    dortmund: bool = False,
    points: int = 11,
    x_min: float = 0.02,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 120,
) -> tuple[NRTL, Array]

Predict binary NRTL b parameters by fitting to UNIFAC activity coefficients.

UNIFAC supplies ln gamma over a composition (and temperature) grid; the two NRTL 1/T coefficients b12, b21 (with a = 0 and fixed alpha) are fitted to it by levenberg_marquardt. Use it to bootstrap a correlative model for a pair without measured data.

Returns:

Type	Description
`tuple[NRTL, Array]`	`(fitted NRTL, final cost)`.

predict_uniquac_from_unifac ¶

predict_uniquac_from_unifac(
    components: list[str],
    t: ArrayLike,
    *,
    r: Array | None = None,
    q: Array | None = None,
    dortmund: bool = False,
    points: int = 11,
    x_min: float = 0.02,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 120,
) -> tuple[UNIQUAC, Array]

Predict binary UNIQUAC b parameters by fitting to UNIFAC activity coefficients.

Like predict_nrtl_from_unifac, but for UNIQUAC. The surface/volume parameters r, q default to the curated values (fugacio.thermo.data.uniquac_rq); the free parameters are the 1/T coefficients of tau = exp(b/T).

Returns:

Type	Description
`tuple[UNIQUAC, Array]`	`(fitted UNIQUAC, final cost)`.

fit_uniquac_binary ¶

fit_uniquac_binary(
    t: Array,
    x: Array,
    p_exp: Array,
    tc: Array,
    pc: Array,
    omega: Array,
    r: Array,
    q: Array,
    *,
    y_exp: Array | None = None,
    b0: tuple[float, float] = (0.0, 0.0),
    max_iter: int = 80,
    **opts: Any,
) -> tuple[UNIQUAC, Array]

Fit binary UNIQUAC b parameters (with given r, q) to bubble-point data.

Free parameters are the 1/T coefficients of ln tau = a + b/T with a = 0. Returns (fitted UNIQUAC, final cost).

Parameter bank¶

parameter_bank ¶

Batch regression of ThermoML datasets into a reusable binary-parameter bank.

This is the bridge between the ThermoML reader (fugacio.thermo.thermoml) and the differentiable fitters (fugacio.thermo.regression): point it at parsed documents (or the bundled samples) and it fits a binary activity model to every isothermal P-x VLE table it can understand, recording the fitted parameters together with their provenance: source dataset, temperature range, point count, and the scaled root-mean-square pressure residual. The collected FittedBinary records form a ParameterBank that hands back ready-to-use NRTL models with the component ordering you ask for.

A bank fitted to the bundled samples ships with the package (parameter_bank.json, regenerated by scripts/gen_parameter_bank.py); load it with ParameterBank.load_bundled. Components are matched to the Fugacio database by CAS number, never by name, so document-local naming quirks ("water (H2O)") cannot mis-assign a dataset.

Classes:

Name	Description
`FittedBinary`	A fitted binary interaction record with its provenance.
`ParameterBank`	A lookup table of fitted binary parameters, keyed by component pair.

Functions:

Name	Description
`fit_vle_dataset`	Fit binary NRTL `b` parameters to one parsed P-x VLE dataset.
`fit_bundled_samples`	Fit every binary P-x VLE dataset among the bundled ThermoML samples.

FittedBinary `dataclass` ¶

FittedBinary(
    components: tuple[str, str],
    model: str,
    alpha: float,
    b12: float,
    b21: float,
    t_min: float,
    t_max: float,
    n_points: int,
    rmse: float,
    source: str,
)

A fitted binary interaction record with its provenance.

Attributes:

Name	Type	Description
`components`	`tuple[str, str]`	Database names `(1, 2)` in the orientation of `b12`/`b21`.
`model`	`str`	Activity-model family (currently always `"nrtl"`).
`alpha`	`float`	The fixed NRTL non-randomness parameter used in the fit.
`b12,`	`b21`	Fitted `tau = b/T` interaction coefficients (K).
`t_min,`	`t_max`	Temperature range of the underlying data (K).
`n_points`	`int`	Number of data rows fitted.
`rmse`	`float`	Root-mean-square residual of `(P_pred - P_exp) / mean(P_exp)`.
`source`	`str`	Provenance string (sample name or document citation).

Methods:

Name	Description
`nrtl`	The fitted model as a ready-to-evaluate two-component NRTL.

nrtl ¶

nrtl() -> NRTL

The fitted model as a ready-to-evaluate two-component NRTL.

ParameterBank ¶

ParameterBank(entries: list[FittedBinary])

A lookup table of fitted binary parameters, keyed by component pair.

Pairs are stored orientation-free: get and nrtl accept the components in either order and return parameters oriented as requested (swapping b12/b21 when needed).

Methods:

Name	Description
`get`	The record for a pair, reoriented so `component_1` is component 1.
`nrtl`	A ready NRTL model for the pair, in the requested component order.
`to_json`	Serialize the bank (sorted, human-diffable).
`from_json`	Inverse of `to_json`.
`load_bundled`	The bank fitted to the bundled ThermoML samples (ships with the package).

Attributes:

Name	Type	Description
`entries`	`list[FittedBinary]`	All records, sorted by component pair for stable iteration.

entries `property` ¶

entries: list[FittedBinary]

All records, sorted by component pair for stable iteration.

get ¶

get(
    component_1: str, component_2: str
) -> FittedBinary | None

The record for a pair, reoriented so component_1 is component 1.

nrtl ¶

nrtl(component_1: str, component_2: str) -> NRTL

A ready NRTL model for the pair, in the requested component order.

Raises:

Type	Description
`KeyError`	if the bank has no record for the pair.

to_json ¶

to_json() -> str

Serialize the bank (sorted, human-diffable).

from_json `classmethod` ¶

from_json(text: str) -> ParameterBank

Inverse of to_json.

load_bundled `classmethod` ¶

load_bundled() -> ParameterBank

The bank fitted to the bundled ThermoML samples (ships with the package).

fit_vle_dataset ¶

fit_vle_dataset(
    data: ThermoMLData,
    dataset: Dataset,
    *,
    alpha: float = 0.3,
    source: str = "",
    max_iter: int = 80,
) -> FittedBinary

Fit binary NRTL b parameters to one parsed P-x VLE dataset.

The dataset must be binary with a temperature column, a composition column for its first component, and a pressure column (the layout of every bundled VLE sample and of archive isothermal P-x tables).

Returns:

Type	Description
`FittedBinary`	The `FittedBinary` record, oriented so component 1 is the
`FittedBinary`	dataset's first component.

Raises:

Type	Description
`ValueError`	if the dataset is not a binary P-x table.
`KeyError`	if a compound cannot be matched to the component database.

fit_bundled_samples ¶

fit_bundled_samples(
    names: list[str] | None = None, *, alpha: float = 0.3
) -> list[FittedBinary]

Fit every binary P-x VLE dataset among the bundled ThermoML samples.

Non-VLE samples (pure-component tables, missing columns) are skipped, so the driver can be pointed at the whole sample directory. This is the batch regression behind the bundled parameter bank.

ThermoML reader¶

thermoml ¶

Reader for the NIST ThermoML archive XML format.

ThermoML <https://www.nist.gov/mml/acmd/trc/thermoml>_ is the IUPAC/NIST XML standard for thermophysical and thermochemical property data; the freely redistributable ThermoML Archive <https://www.nist.gov/mml/acmd/trc/thermoml/thermoml-archive>_ holds tens of thousands of experimental datasets. This module turns those files into tidy, typed tables you can feed straight into fugacio.thermo.regression, so a model can be fitted to real measurements, and predictions graded against them.

The parser is deliberately tolerant and dependency-free (standard-library xml.etree.ElementTree only):

XML namespaces are stripped, so files declaring the ThermoML namespace (or none) parse identically;
compounds, mixtures, variables, properties, and the numeric value rows are read by their local element names, matching the published schema without binding to a specific version;
each Dataset exposes its columns as aligned numeric rows plus convenience accessors (Dataset.temperature, Dataset.pressure, Dataset.mole_fraction) with pressure unit conversion to pascal.

A couple of small, schema-faithful datasets ship with the package for tests and examples; see list_samples / load_sample.

Classes:

Name	Description
`Compound`	A chemical compound declared in a ThermoML document.
`Column`	One variable or property column of a `Dataset` table.
`Dataset`	A `PureOrMixtureData` block: a table of measurements for one mixture.
`ThermoMLData`	A parsed ThermoML document: its compounds, datasets, and citation.

Functions:

Name	Description
`loads`	Parse a ThermoML document from an in-memory string or bytes.
`read_thermoml`	Parse a ThermoML document from a path or open file object.
`list_samples`	Names (without extension) of the bundled ThermoML sample datasets.
`sample_path`	Filesystem path of a bundled sample (with or without the `.xml` suffix).
`load_sample`	Parse a bundled ThermoML sample by name (see `list_samples`).

Compound `dataclass` ¶

Compound(
    org_num: int,
    name: str | None = None,
    formula: str | None = None,
    cas: str | None = None,
    inchikey: str | None = None,
)

A chemical compound declared in a ThermoML document.

Attributes:

Name	Type	Description
`org_num`	`int`	The document-local organization number used to reference this compound from mixtures and composition variables.
`name`	`str \| None`	Common name, if given.
`formula`	`str \| None`	Molecular formula, if given.
`cas`	`str \| None`	CAS registry number, if present.
`inchikey`	`str \| None`	Standard InChIKey, if present.

Column `dataclass` ¶

Column(
    number: int,
    role: str,
    kind: str,
    label: str,
    component: int | None = None,
)

One variable or property column of a Dataset table.

Attributes:

Name	Type	Description
`number`	`int`	The `nVarNumber` / `nPropNumber` within the dataset.
`role`	`str`	`"variable"` (an independent, controlled quantity) or `"property"` (a measured quantity).
`kind`	`str`	The ThermoML type element local name, e.g. `"eTemperature"`, `"ePressure"`, `"eComponentComposition"`.
`label`	`str`	Human-readable label including units, e.g. `"Pressure, kPa"`.
`component`	`int \| None`	For composition columns, the `org_num` of the component the fraction refers to; otherwise `None`.

quantity `property` ¶

quantity: str

The label with any trailing unit removed.

unit `property` ¶

unit: str | None

The unit parsed from the label, if any.

Dataset `dataclass` ¶

Dataset(
    components: tuple[int, ...],
    columns: tuple[Column, ...],
    rows: tuple[tuple[float, ...], ...],
    phase: str | None = None,
    number: int | None = None,
)

A PureOrMixtureData block: a table of measurements for one mixture.

The rows are aligned with columns; a missing cell is float('nan').

Attributes:

Name	Type	Description
`components`	`tuple[int, ...]`	`org_num` of each component participating, in document order.
`columns`	`tuple[Column, ...]`	The variable and property columns, in document order.
`rows`	`tuple[tuple[float, ...], ...]`	Numeric rows aligned with `columns`.
`phase`	`str \| None`	The reported phase string, if any (e.g. `"Liquid"`).
`number`	`int \| None`	The `nPureOrMixtureDataNumber` identifier, if present.

Methods:

Name	Description
`values`	All values of one column, in row order.
`find_column`	First column matching the given filters (any combination).
`temperature`	Temperatures in kelvin (raises if the dataset has no temperature column).
`pressure`	Pressures converted to `unit` (default pascal).
`mole_fraction`	Mole fractions of `component` (by `org_num`).
`to_dict`	The table as `{label: [values...]}` (duplicate labels get a suffix).

labels `property` ¶

labels: tuple[str, ...]

Column labels, in column order.

values ¶

values(col: Column) -> tuple[float, ...]

All values of one column, in row order.

find_column ¶

find_column(
    *,
    kind: str | None = None,
    quantity: str | None = None,
    component: int | None = None,
) -> Column | None

First column matching the given filters (any combination).

temperature ¶

temperature() -> tuple[float, ...]

Temperatures in kelvin (raises if the dataset has no temperature column).

pressure ¶

pressure(*, unit: str = 'Pa') -> tuple[float, ...]

Pressures converted to unit (default pascal).

Accepts pressure stored either as a controlled variable (ePressure) or as a measured property (e.g. a vapour-pressure column).

mole_fraction ¶

mole_fraction(component: int) -> tuple[float, ...]

Mole fractions of component (by org_num).

to_dict ¶

to_dict() -> dict[str, list[float]]

The table as {label: [values...]} (duplicate labels get a suffix).

ThermoMLData `dataclass` ¶

ThermoMLData(
    compounds: tuple[Compound, ...],
    datasets: tuple[Dataset, ...],
    citation: str | None = None,
)

A parsed ThermoML document: its compounds, datasets, and citation.

Methods:

Name	Description
`compound`	The compound with the given `org_num` (raises `KeyError` if absent).
`component_names`	Best-effort names of a dataset's components (falls back to `C{org_num}`).

compound ¶

compound(org_num: int) -> Compound

The compound with the given org_num (raises KeyError if absent).

component_names ¶

component_names(dataset: Dataset) -> list[str]

Best-effort names of a dataset's components (falls back to C{org_num}).

loads ¶

loads(text: str | bytes) -> ThermoMLData

Parse a ThermoML document from an in-memory string or bytes.

read_thermoml ¶

read_thermoml(
    source: str | Path | IO[bytes] | IO[str],
) -> ThermoMLData

Parse a ThermoML document from a path or open file object.

Parameters:

Name	Type	Description	Default
`source`	`str \| Path \| IO[bytes] \| IO[str]`	A filesystem path (`str`/`Path`) or a readable file object containing ThermoML XML.	required

Returns:

Type	Description
`ThermoMLData`	The parsed `ThermoMLData`.

list_samples ¶

list_samples() -> list[str]

Names (without extension) of the bundled ThermoML sample datasets.

sample_path ¶

sample_path(name: str) -> Path

Filesystem path of a bundled sample (with or without the .xml suffix).

load_sample ¶

load_sample(name: str) -> ThermoMLData

Parse a bundled ThermoML sample by name (see list_samples).

Regression & ThermoML¶

Regression¶

regression ¶

levenberg_marquardt ¶

gradient_descent ¶

bubble_pressure_residuals ¶

activity_residuals ¶

lle_residuals ¶

fit_nrtl_binary ¶

unifac_ln_gamma_grid ¶

predict_nrtl_from_unifac ¶

predict_uniquac_from_unifac ¶

fit_uniquac_binary ¶

Parameter bank¶

parameter_bank ¶

FittedBinary dataclass ¶

nrtl ¶

ParameterBank ¶

entries property ¶

get ¶

nrtl ¶

to_json ¶

from_json classmethod ¶

load_bundled classmethod ¶

fit_vle_dataset ¶

fit_bundled_samples ¶

ThermoML reader¶

thermoml ¶

Compound dataclass ¶

Column dataclass ¶

quantity property ¶

unit property ¶

Dataset dataclass ¶

labels property ¶

values ¶

find_column ¶

temperature ¶

pressure ¶

mole_fraction ¶

to_dict ¶

ThermoMLData dataclass ¶

compound ¶

component_names ¶

loads ¶

read_thermoml ¶

list_samples ¶

sample_path ¶

load_sample ¶

FittedBinary `dataclass` ¶

entries `property` ¶

from_json `classmethod` ¶

load_bundled `classmethod` ¶

Compound `dataclass` ¶

Column `dataclass` ¶

quantity `property` ¶

unit `property` ¶

Dataset `dataclass` ¶

labels `property` ¶

ThermoMLData `dataclass` ¶