lyprox.dataexplorer.query

module documentation

Querying and generating statistics from the table of patients.

In this module, we define the classes and methods to filter and query the table, as well as compute statistics from a queried/filtered patient table to be displayed on the main dashboard of LyProX.

In the views, the execute_query function is called with the cleaned data from the DataexplorerForm. This execute_query function then creates a combined query using the fancy lydata.accessor.C objects from lydata. These classes allow arbitrary combinations of deferred queries to be created and only later be executed.

After executing the query, the filtered dataset is used to compute Statistics using the from_table classmethod. This pydantic.BaseModel has similar fields to the DataexplorerForm and is used to display the aggregated information of the filtered patient table in the dashboard.

Class	`BaseStatistics`	Basic statistics to be computed and displayed on the dashboard.
Function	`assemble_selected_modalities`	Turn a list of modality names into a dictionary of modality configurations.
Function	`execute_query`	Execute the query defined by the `DataexplorerForm`.
Function	`get_lnl_query`	Create a query for the LNLs based on the cleaned form data.
Function	`get_risk_factor_query`	Create a query for the risk factors based on the cleaned form data.
Function	`join_dataset_tables`	Join the tables of the selected datasets into a single table.
Function	`make_ensure_keys_validator`	Create an `AfterValidator` to ensure all `keys` are present in the data.
Function	`safe_value_counts`	Return the value counts of a column, including missing values as `None`.
Type Variable	`KT`	Undocumented
Type Variable	`T`	Undocumented
Type Alias	`EnsureKeysSignature`	Undocumented
Variable	`lnl_fields`	LNL fields, dynamically created for unpacking in the `pydantic.create_model` call.
Variable	`logger`	Undocumented
Variable	`NullableBoolCounts`	Keys may be `True`, `False`, or `None`, while values are the counts of each.
Variable	`SexCounts`	Keys are `male` and `female`, value are respective counts.
Variable	`Statistics`	Statistics to be computed and displayed on the dashboard.
Variable	`SubsiteCounts`	Keys are the subsite ICD codes, values are the counts of each.
Variable	`TStageCounts`	Keys are the T-stages, values are the counts of each.

def assemble_selected_modalities(names: list[str]) -> dict[str, lyutils.ModalityConfig]: ¶

Turn a list of modality names into a dictionary of modality configurations.

def execute_query(cleaned_form_data: dict[str, Any]) -> pd.DataFrame: ¶

Execute the query defined by the DataexplorerForm.

After validating a DataexplorerForm by calling form.is_valid(), the cleaned data is accessible as the attribute form.cleaned_data. The returned dictionary should be passed to this function as the cleaned_form_data argument.

Based on this cleaned form data, the involvement data from different modalities is combined using the lydata accessor method lydata.accessor.LyDataAccessor.combine. Then, a query is created using the lydata.accessor.C objects and executed on the dataset using the lydata.accessor.LyDataAccessor.query method. The resulting filtered dataset is returned.

def get_lnl_query(cleaned_form: dict[str, Any]) -> QTypes: ¶

Create a query for the LNLs based on the cleaned form data.

def get_risk_factor_query(cleaned_form: dict[str, Any]) -> QTypes: ¶

Create a query for the risk factors based on the cleaned form data.

def join_dataset_tables(datasets: QuerySet | Sequence[DatasetModel], method: Literal['max_llh', 'rank'] = 'max_llh') -> pd.DataFrame: ¶

Join the tables of the selected datasets into a single table.

This iterates through the datasets and loads their respective pd.DataFrame tables. It also adds a column ["dataset", "info", "name"] to the table to keep track of which dataset a row belongs to. Finally, it concatenates all tables into a single table and returns it.

In case the datasets are empty, a likewise empty table is created with all the columns necessary to create a Statistics object. These columns are in turn constructed from the schema of the lydata.validator module.

def make_ensure_keys_validator(keys: list[KT]) -> EnsureKeysSignature: ¶

Create an AfterValidator to ensure all keys are present in the data.

This creates a function that can be used with pydantic's AfterValidator to ensure that all keys are present in the validated data. pydantic first receives the value counts from the safe_value_counts function, validates it, and then calls the function created by this wrapper to ensure that all keys are present.

def safe_value_counts(column: pd.Series) -> dict[Any, int]: ¶

Return the value counts of a column, including missing values as None.

>>> column = pd.Series(['a', 'b', 'c', np.nan, 'a', 'b', 'c', 'a', 'b', 'c'])
>>> safe_value_counts(column)
{'a': 3, 'b': 3, 'c': 3, None: 1}

KT = ¶

Undocumented

Value

TypeVar('KT')

T = ¶

Undocumented

Value

TypeVar('T',
        bound='BaseStatistics')

EnsureKeysSignature = ¶

Undocumented

Value

Callable[[dict[KT, int]], dict[KT, int]]

lnl_fields = ¶

LNL fields, dynamically created for unpacking in the pydantic.create_model call.

logger = ¶

Undocumented

NullableBoolCounts = ¶

Keys may be True, False, or None, while values are the counts of each.

SexCounts = ¶

Keys are male and female, value are respective counts.

Statistics = ¶

Statistics to be computed and displayed on the dashboard.

This class extends the BaseStatistics class by adding the dynamically created fields for the LNLs. That way, I did not have to write them by hand.

The intended use is to first query a table of patients using the execute_query function with the cleaned form data from the DataexplorerForm. Then, pass the queried table to this class's from_table method to compute the statistics. Finally, pass the computed statistics to the context of the dataexplorer.views to be displayed in the rendered HTML or JSON response.

By design, this class's fields mirror the fields of the DataexplorerForm class. This is obviously necessary, since any information data might be queried on is also information that one can compute statistics on.

SubsiteCounts = ¶

Keys are the subsite ICD codes, values are the counts of each.

TStageCounts = ¶

Keys are the T-stages, values are the counts of each.