class documentation

class Dataset(loggers.ModelLoggerMixin, models.Model):

View In Hierarchy

A collection of patients, usually importet from a CSV file in a GitHub repo.

When created, this model fetches the CSV file from the GitHub repo and adds the Patient entries to the database. It also creates the realted Tumor and Diagnose entries.

Class Meta Undocumented
Static Method get_institution Return the institution that provided the dataset.
Method __str__ Undocumented
Method check_integrity Check whether the dataset is still consistent with the GitHub repo.
Method compute_fields Dynamically compute fields from the GitHub repository.
Method delete Delete the model instance from the database.
Method fetch_dataframe Return the dataset as a pandas DataFrame.
Method fetch_file Return the GitHub file object.
Method fetch_readme Return the README.md file of the dataset as a string.
Method fetch_repo Return the GitHub repository object.
Method import_csv_to_db Import the dataset from the CSV file into the database.
Method lock Lock the dataset, so that it cannot be edited anymore.
Method save Save the model instance to the database.
Method unlock Unlock the dataset, so that it can be edited again.
Instance Variable data_path Path to the CSV file containing the patient data inside the git repo.
Instance Variable data_sha SHA of the CSV file in the GitHub repo.
Instance Variable date_created Date and time when the dataset was created.
Instance Variable git_repo_name Name of the GitHub repository that contains the dataset.
Instance Variable git_repo_owner Owner of the GitHub repository that contains the dataset.
Instance Variable institution The institution that provided the dataset.
Instance Variable is_locked Whether the dataset is locked or not. Locked datasets cannot be edited.
Instance Variable is_outdated Whether the data file has been updated since the last import.
Instance Variable is_public Whether the dataset is public or not. Public datasets can be viewed by everyone.
Instance Variable revision Git revision in which to search for the data. E.g., a commit hash, or tag.
Property data_url Return the URL of the data file in the GitHub repository.
Property git_repo_id Return the ID of the GitHub repository.
Property git_repo_url Return the URL of the GitHub repository.
Property name Return the name of the dataset.
Property patient_count Return the number of patients in the dataset.
Instance Variable _dataframe Undocumented
Instance Variable _file Undocumented
Instance Variable _readme Undocumented
Instance Variable _repo Undocumented
@staticmethod
def get_institution(table, fallback):

Return the institution that provided the dataset.

Parameters
table:pd.DataFrameUndocumented
fallback:InstitutionUndocumented
Returns
InstitutionUndocumented
def __str__(self):

Undocumented

def check_integrity(self):

Check whether the dataset is still consistent with the GitHub repo.

def compute_fields(self, git_repo_url, revision, data_path, user_institution, **_kwargs):

Dynamically compute fields from the GitHub repository.

Parameters
git_repo_url:strUndocumented
revision:strUndocumented
data_path:strUndocumented
user_institution:InstitutionUndocumented
**_kwargsUndocumented
def delete(self, *args, override=False, **kwargs):

Delete the model instance from the database.

Rise an error if the dataset is locked and override is not set to True.

Parameters
*argsUndocumented
override:boolUndocumented
**kwargsUndocumented
def fetch_dataframe(self):

Return the dataset as a pandas DataFrame.

Returns
pd.DataFrameUndocumented
def fetch_file(self):

Return the GitHub file object.

def fetch_readme(self):

Return the README.md file of the dataset as a string.

Returns
strUndocumented
def fetch_repo(self):

Return the GitHub repository object.

def import_csv_to_db(self):

Import the dataset from the CSV file into the database.

This method lock the dataset right afterwards to prevent editing the uploaded patients.

def lock(self):

Lock the dataset, so that it cannot be edited anymore.

def save(self, *args, override=False, **kwargs):

Save the model instance to the database.

Rise an error if the dataset is locked and override is not set to True.

Parameters
*argsUndocumented
override:boolUndocumented
**kwargsUndocumented
def unlock(self):

Unlock the dataset, so that it can be edited again.

data_path =

Path to the CSV file containing the patient data inside the git repo.

data_sha =

SHA of the CSV file in the GitHub repo.

date_created =

Date and time when the dataset was created.

git_repo_name =

Name of the GitHub repository that contains the dataset.

git_repo_owner =

Owner of the GitHub repository that contains the dataset.

institution =

The institution that provided the dataset.

is_locked: bool =

Whether the dataset is locked or not. Locked datasets cannot be edited.

is_outdated: bool =

Whether the data file has been updated since the last import.

is_public =

Whether the dataset is public or not. Public datasets can be viewed by everyone.

revision =

Git revision in which to search for the data. E.g., a commit hash, or tag.

@property
data_url =

Return the URL of the data file in the GitHub repository.

@property
git_repo_id =

Return the ID of the GitHub repository.

@property
git_repo_url =

Return the URL of the GitHub repository.

@property
name =

Return the name of the dataset.

@property
patient_count: int =

Return the number of patients in the dataset.

_dataframe =

Undocumented

_file =

Undocumented

_readme =

Undocumented

_repo =

Undocumented