Querying and generating statistics from the table of patients.
In this module, we define the classes and methods to filter and query the table, as well as compute statistics from a queried/filtered patient table to be displayed on the main dashboard of LyProX.
In the views
, the execute_query
function is called with the cleaned data from the
DataexplorerForm
. This execute_query
function then creates a combined query using
the fancy lydata.accessor.C
objects from lydata. These classes allow arbitrary
combinations of deferred queries to be created and only later be executed.
After executing the query, the filtered dataset is used to compute Statistics
using
the from_table
classmethod. This pydantic.BaseModel
has similar fields to the
DataexplorerForm
and is used to display the aggregated information of the filtered
patient table in the dashboard.
Class |
|
Basic statistics to be computed and displayed on the dashboard. |
Function | assemble |
Turn a list of modality names into a dictionary of modality configurations. |
Function | execute |
Execute the query defined by the DataexplorerForm . |
Function | get |
Create a query for the LNLs based on the cleaned form data. |
Function | get |
Create a query for the risk factors based on the cleaned form data. |
Function | join |
Join the tables of the selected datasets into a single table. |
Function | make |
Create an AfterValidator to ensure all keys are present in the data. |
Function | safe |
Return the value counts of a column, including missing values as None. |
Type Variable | KT |
Undocumented |
Type Variable | T |
Undocumented |
Type Alias |
|
Undocumented |
Variable | lnl |
LNL fields, dynamically created for unpacking in the pydantic.create_model call. |
Variable | logger |
Undocumented |
Variable |
|
Keys may be True, False, or None, while values are the counts of each. |
Variable |
|
Keys are male and female, value are respective counts. |
Variable |
|
Statistics to be computed and displayed on the dashboard. |
Variable |
|
Keys are the subsite ICD codes, values are the counts of each. |
Variable |
|
Keys are the T-stages, values are the counts of each. |
Execute the query defined by the DataexplorerForm
.
After validating a DataexplorerForm
by calling form.is_valid(), the cleaned
data is accessible as the attribute form.cleaned_data. The returned dictionary
should be passed to this function as the cleaned_form_data argument.
Based on
this cleaned form data, the involvement data from different modalities is combined
using the lydata accessor method lydata.accessor.LyDataAccessor.combine
. Then,
a query is created using the lydata.accessor.C
objects and executed on the
dataset using the lydata.accessor.LyDataAccessor.query
method. The resulting
filtered dataset is returned.
QuerySet | Sequence[ DatasetModel]
, method: Literal[ 'max_llh', 'rank']
= 'max_llh') -> pd.DataFrame
:
¶
Join the tables of the selected datasets into a single table.
This iterates through the datasets and loads their respective pd.DataFrame
tables.
It also adds a column ["dataset", "info", "name"] to the table to keep track of
which dataset a row belongs to. Finally, it concatenates all tables into a single
table and returns it.
In case the datasets are empty, a likewise empty table is created with all the
columns necessary to create a Statistics
object. These columns are in turn
constructed from the schema of the lydata.validator
module.
Create an AfterValidator
to ensure all keys are present in the data.
This creates a function that can be used with pydantic's AfterValidator
to ensure
that all keys are present in the validated data. pydantic first receives the
value counts from the safe_value_counts
function, validates it, and then calls the
function created by this wrapper to ensure that all keys are present.
Return the value counts of a column, including missing values as None.
>>> column = pd.Series(['a', 'b', 'c', np.nan, 'a', 'b', 'c', 'a', 'b', 'c']) >>> safe_value_counts(column) {'a': 3, 'b': 3, 'c': 3, None: 1}
Statistics to be computed and displayed on the dashboard.
This class extends the BaseStatistics
class by adding the dynamically created
fields for the LNLs. That way, I did not have to write them by hand.
The intended use is to first query a table of patients using the execute_query
function with the cleaned form data from the DataexplorerForm
. Then, pass the queried
table to this class's from_table
method to compute the statistics. Finally, pass the
computed statistics to the context of the dataexplorer.views
to be displayed in
the rendered HTML or JSON response.
By design, this class's fields mirror the fields of the DataexplorerForm
class. This
is obviously necessary, since any information data might be queried on is also
information that one can compute statistics on.