Fine-grained subphenotypes in acute kidney injury populations based on deep clustering: derivation and interpretation

Paper link: https://www.sciencedirect.com/science/article/pii/S1386505624002168

Workflow

Study Design: Individuals with AKI within 48-hour admission were included and temporal laboratory measurements were extracted from two public databases.
Representation Learning: Laboratory data were split to derive multiple representations with supervised LSTM autoencoders.
Subphenotypes Derivation: We got the corresponding cluster from multiple representations using K-means. Then, the similarity matrix was derived by computing the frequency of clustering to the same group. Finally, subphenotypes were derived from the similarity matrix by K-means.
Subphenotype Interpretation: Static patterns analysis, dynamic patterns analysis, and predictive interpretation were conducted to interpret the subphenotypes.

Understanding deep consensus clustering

a. Time series electronic health records (EHR) were utilized as input data for the Bi-LSTM encoder in our study. The Bi-LSTM encoder processed the EHR data, and the hidden state in the final time step was extracted as the representation. Subsequently, the Bi-LSTM decoder employed both the representation and the time-reversed input data to reconstruct the original input data. Simultaneously, the supervisor component was co-trained with the Bi-LSTM autoencoder. This supervisor component utilized the extracted representation as input to predict the probability of mortality [Paper].
b. We represented EHR in multiple representations in different hidden dimensions. Subsequently, K-means was employed to cluster each representation. In this illustration, we conducted K-means on five representations (left). The similarity of pair-wise patients was quantified as the frequency of being clustered into the same group. For instance, patient 1 and patient 2 were consistently clustered together in four out of five representations, and their similarity was calculated as 4 divided by 5. This similarity matrix, a symmetrical matrix, was constructed to record the pairwise similarities between patients (right).

How to explore this project

Installing dependencies

All the required dependencies are in requirements.txt. For installing all the dependencies run this line of code in command

pip install -r requirements.txt

Data preparation

Prepare your time-series data into a .pkl file and put it in the folder dataset. You can customize your dataset in utils.py.
The input data X requires a three-dimension format [number of samples, time dimension, feature dimension].
The input data Y requires a one-dimension format. The representations will be correlated with data Y by supervised learning [Paper].

Run codes

Get representations

To generate representations in multiple dimensions in parallel, run code in command

./01_get_representations.sh

The parameters were illustrated as bellow,

Parameter	Explanation
max_dim	max hidden dimensions of LSTM encoder/decoder
hidden_dims	hidden dimensions of LSTM encoder/decoder
input_dim	number of input features
n_layers	number of LSTM layers in encoder/decoder
pre_epoch	number of pre-training epoch of autoencoder without supervisor
lambda_AE	weight of autoencoder loss
lambda_outcome	weight of supervisor loss

The code results in multiple representations rep_{n}.pkl in the folder ./representations and logs in the folder ./runs. The representations in different hidden dimensions can be visualized as below

Conduct consensus clustering

To conduct consensus clustering on representations in multiple dimensions in parallel, run 02_consensus_clustering.py, which will generates the figure of the relative change in area under CDF Curve and the average consensus value of each cluster.
The derived subphenotypes consensus_cluster_{k}.pkl will be generated in the folder ./results. The consensus matrixes demonstrated the similarity of the pair-wised patients. The blue lines indicated the cluster division.

Conduct further analysis

Further analysis was summarized as below

Code	Description
03_sensitivity_analyze.py	Sensitivity analysis of consensus matrix by bootstrapping
04_visualization_of_cm&rep.py	Visualizations of consensus matrixes and representations
05_cluster_transfer.R	Sankey diagram of the transfer of clusters
06_relative_risk.R	Forest plot of relative risk of mortality
07_comorbidity_bubble.py	Bubble plot of the distribution of comorbidities
08_KDIGO_circlize.R	Chord diagram of the relationship between subphenotypes and KDIGO stages
09_survival_analysis.R	Survival analysis and visualization of subphenotypes
10_KDIGO_dynamic.R	Bar chart on the dynamic proportion of KDIGO stage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fine-grained subphenotypes in acute kidney injury populations based on deep clustering: derivation and interpretation

Workflow

Understanding deep consensus clustering

How to explore this project

Installing dependencies

Data preparation

Run codes

Get representations

Conduct consensus clustering

Conduct further analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
dataset		dataset
img		img
sample		sample
01_get_representations.py		01_get_representations.py
01_get_representations.sh		01_get_representations.sh
02_consensus_clustering.py		02_consensus_clustering.py
03_sensitivity_analyze.py		03_sensitivity_analyze.py
04_visualization_of_cm&rep.py		04_visualization_of_cm&rep.py
05_cluster_transfer.R		05_cluster_transfer.R
06_relative_risk.R		06_relative_risk.R
07_comorbidity_bubble.py		07_comorbidity_bubble.py
08_KDIGO_circlize.R		08_KDIGO_circlize.R
09_survival_analysis.R		09_survival_analysis.R
10_KDIGO_dynamic.R		10_KDIGO_dynamic.R
LICENSE		LICENSE
README.md		README.md
model.py		model.py
requirements.txt		requirements.txt
utils.py		utils.py

License

YongsenTan/Deep-Consensus-Clustering

Folders and files

Latest commit

History

Repository files navigation

Fine-grained subphenotypes in acute kidney injury populations based on deep clustering: derivation and interpretation

Workflow

Understanding deep consensus clustering

How to explore this project

Installing dependencies

Data preparation

Run codes

Get representations

Conduct consensus clustering

Conduct further analysis

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages