This script prepares input for Figure 2e: Mediation analysis of the host and microbiome features for the selected drug combinations. Drug-feature effect graph representing potential feature mediation effects between host and microbiome features. Solid lines represent drug effects on the feature, colour represents direction of the effect. Dashed lines between features indicate potential mediation (general mediation model one-sided P < 0.1 ), colour represents the sign of Pearson’s correlation coefficient (P < 0.1).
For visualization purposes, the edges between drugs and features in this graph have the value of drug combination effect size for the corresponding combination. (E.g. if combination of statin wih calcium antagonist synergistically decrease VLDL Cholesterol, both statin and calcium antagonist have an edge connecting them with VLDL cholesterol, which has the edge value of the combination effect size).
Required file in the input_data folder:
Supplementary Table 6: Features of microbiome, host and metabolome impacted by different drug groups and drug compounds. Results of drug group (or drug compound according to the ATC classification) assessment for its impact on host and microbiome features for each patient group. Compound comparison with Maier et al., Nature 2018, tab shows microbiome features negatively impacted by the drug treatment (for the ATC-level compounds) in at least one patient group, and bacterial species whose growth was inhibited by the same drug in the in vitro experiment.
Supplementary Table 8: Features of microbiome, host and metabolome impacted by different drug combinations. Analysis of the effect of drug combinations, assessed for impact on host and microbiome falling within different measurement categories in each patient group.
Supplementary Table 10. Mediation analysis of host and microbiome features for drug intake, dosage and combinations. Mediation analysis via a regression model of drug effect on each host feature mediated through a microbiome feature or vice versa.
Figure is based on the data from Supplementary Tables 6, 8 and 10.
# load required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.patches as mpatches
%matplotlib inline
# read combination feature mediation file
fileFolder = './input_data/'
fileName = 'Supplementary_Table_10_2019-09-13434.xlsx'
sheetName = 'Drug combinations'
mediation_df_filtered_combined_filtered = pd.read_excel(fileFolder + fileName,
sheet_name = sheetName)
Get feature infromation for drug and drug combination associations for further filtering
# read single drug features files
fileName = 'Supplementary_Table_6_2019-09-13434.xlsx'
sheetName = 'Drug group effect'
drugEffect = pd.read_excel(fileFolder + fileName,
sheet_name = sheetName)
# read drug combination features files
fileName = 'Supplementary_Table_8_2019-09-13434.xlsx'
sheetName = 'Data'
drugCombinationEffect = pd.read_excel(fileFolder + fileName,
sheet_name = sheetName)
# concatenate single drug effects and combination effects
allEffects_df = pd.concat([drugCombinationEffect, drugEffect])
# get the patient groups (Sample set column)
listconditions = list(set(allEffects_df['Sample set']))
For each mediation pair, get effect sizes of drugs and combinations and diseases
curset = 'T2D (3)' # select Group T2D (3)
# copy all effects for one condition to reduce number
selectedEffectd_df = allEffects_df[(allEffects_df['Sample set'].str.find(curset)>=0)]
# select only features with congruence opposite of disease (drug and disease effects are opposite)
selectedEffectd_df = selectedEffectd_df[selectedEffectd_df['Congruence']=='Opposite'].copy()
# create a dataframe containing information on the potential features and mediators
mediation_df_filtered_combined_filtered_features = []
for i in range(len(mediation_df_filtered_combined_filtered)):
curfeature1 = mediation_df_filtered_combined_filtered['Feature1'].iloc[i]
curfeature2 = mediation_df_filtered_combined_filtered['Feature2_med'].iloc[i]
curspace1 = mediation_df_filtered_combined_filtered['FeatureSpace1'].iloc[i]
curspace2 = mediation_df_filtered_combined_filtered['FeatureSpace2'].iloc[i]
cureffector = mediation_df_filtered_combined_filtered['Effector'].iloc[i]
cureffector_drugs = cureffector.replace('Combination: ', '')
cureffector_drugs = cureffector_drugs.split(', ')
cureffect_df = selectedEffectd_df[
(((selectedEffectd_df['Feature display name']==curfeature1) &
(selectedEffectd_df['Feature space']==curspace1)) |
((selectedEffectd_df['Feature display name']==curfeature2) &
(selectedEffectd_df['Feature space']==curspace2)))&
((selectedEffectd_df['Effector']=='Group contrast') |
(selectedEffectd_df['Effector']==cureffector) |
(selectedEffectd_df['Effector']==cureffector_drugs[0]) |
(selectedEffectd_df['Effector']==cureffector_drugs[1]))]
if len(mediation_df_filtered_combined_filtered_features)==0:
mediation_df_filtered_combined_filtered_features = cureffect_df.copy()
else:
mediation_df_filtered_combined_filtered_features = pd.concat([
mediation_df_filtered_combined_filtered_features, cureffect_df])
mediation_df_filtered_combined_filtered_features_subset = \
mediation_df_filtered_combined_filtered.copy()
Select statin, metformin, aspirin and calcium antagonists to extract the mediation graph.
ploteffectors = list(set(mediation_df_filtered_combined_filtered_features_subset['Effector']))
ploteffectors = [item for item in ploteffectors
if (('Statin' in item) & ('Metformin' in item)) |
(('Statin' in item) & ('Aspirin' in item)) |
(('Statin' in item) & ('Calcium' in item))]
ploteffectors
Filter mediation results by effectors and patient type
mediation_df_filtered_combined_filtered_features_subset
mediation_df_filtered_combined_filtered_features_subset = \
mediation_df_filtered_combined_filtered_features_subset[
(mediation_df_filtered_combined_filtered_features_subset['Effector']==ploteffectors[0]) |
(mediation_df_filtered_combined_filtered_features_subset['Effector']==ploteffectors[1]) |
(mediation_df_filtered_combined_filtered_features_subset['Effector']==ploteffectors[2])].copy()
feature_set = 'T2D (3)'
mediation_df_filtered_combined_filtered_features_subset = \
mediation_df_filtered_combined_filtered_features_subset[
mediation_df_filtered_combined_filtered_features_subset['Sample set']==feature_set].copy()
mediation_df_filtered_combined_filtered_features_subset = \
mediation_df_filtered_combined_filtered_features_subset[
mediation_df_filtered_combined_filtered_features_subset['ACME (average)_P-value']<=0.1].copy()
Select feature pairs that are present in drug combo or single drugs feature matrices
newcolumn_names = ['Feature1_drug1','Feature1_drug2',
'Feature1_drug1effect','Feature1_drug2effect',
'Feature1_drugcombo',
'Feature2_drug1','Feature2_drug2',
'Feature2_drug1effect','Feature2_drug2effect',
'Feature2_drugcombo']
for col in newcolumn_names:
mediation_df_filtered_combined_filtered_features_subset[col] = np.zeros(
[len(mediation_df_filtered_combined_filtered_features_subset),1])
# prepare a subset of features to record effect of each drug on each feature
mediation_df_filtered_combined_filtered_features_subset = \
mediation_df_filtered_combined_filtered_features_subset.reset_index()
# populate the dataframe with information on which drugs are associated with each feature
for i in range(len(mediation_df_filtered_combined_filtered_features_subset)):
curfeatures = [mediation_df_filtered_combined_filtered_features_subset['Feature1'].iloc[i],
mediation_df_filtered_combined_filtered_features_subset['Feature2_med'].iloc[i]]
curspaces = [mediation_df_filtered_combined_filtered_features_subset['FeatureSpace1'].iloc[i],
mediation_df_filtered_combined_filtered_features_subset['FeatureSpace2'].iloc[i]]
cureffector = mediation_df_filtered_combined_filtered_features_subset['Effector'].iloc[i]
cureffector_drugs = cureffector.replace('Combination: ', '')
cureffector_drugs = cureffector_drugs.split(', ')
for curfeat_i in range(len(curfeatures)):
for curdrug_i in range(len(cureffector_drugs)):
cureffect_df = selectedEffectd_df[
(selectedEffectd_df['Feature display name']==curfeatures[curfeat_i]) &
(selectedEffectd_df['Feature space']==curspaces[curfeat_i]) &
(selectedEffectd_df['Effector']==cureffector_drugs[curdrug_i])]
if len(cureffect_df)>0:
mediation_df_filtered_combined_filtered_features_subset.loc[i,
'Feature'+str(curfeat_i+1)+'_drug'+str(curdrug_i+1)] = \
cureffector_drugs[curdrug_i]
mediation_df_filtered_combined_filtered_features_subset.loc[i,
'Feature'+str(curfeat_i+1)+'_drug'+str(curdrug_i+1)+'effect'] = \
cureffect_df['Effect size'].values[0]
cureffect_df = selectedEffectd_df[
(selectedEffectd_df['Feature display name']==curfeatures[curfeat_i]) &
(selectedEffectd_df['Feature space']==curspaces[curfeat_i]) &
(selectedEffectd_df['Effector'].str.find(cureffector_drugs[0])>=0) &
(selectedEffectd_df['Effector'].str.find(cureffector_drugs[1])>=0)]
if len(cureffect_df)>0:
mediation_df_filtered_combined_filtered_features_subset.loc[i,
'Feature'+str(curfeat_i+1)+'_drugcombo'] = \
cureffect_df['Effect size'].values[0]
# for plotting, select only mediated features that are both affected by the drug combination
plotdata_both = mediation_df_filtered_combined_filtered_features_subset[
(mediation_df_filtered_combined_filtered_features_subset['Feature1_drugcombo']!=0) &
(mediation_df_filtered_combined_filtered_features_subset['Feature2_drugcombo']!=0)].copy()
# select only features that pass correlation p-value threshold of 0.1
plotdata_both_corr = plotdata_both[plotdata_both['FeatureFeatureCorrP']<=0.1].copy()
Save mediator features to graph
plotdata = plotdata_both_corr.copy()
# represent the data as graph nodes (drugs and features)
# connected by edges (effect sizes and correlations)
graphnodes = []
graphsources = []
graphedges = []
graphedgesP = []
nodesnames = []
nodestypes=[]
edgetype=[]
for i in range(len(plotdata)):
cureffector = plotdata['Effector'].iloc[i]
cureffector_drugs = cureffector.replace('Combination: ', '')
cureffector_drugs = cureffector_drugs.split(', ')
# feature 1 drug 1
graphnodes.append(cureffector_drugs[0])
graphsources.append(plotdata['Feature1'].iloc[i])
graphedges.append(plotdata['Feature1_drugcombo'].iloc[i])
edgetype.append('drug_feat')
# feature 1 drug 2
graphnodes.append(cureffector_drugs[1])
graphsources.append(plotdata['Feature1'].iloc[i])
graphedges.append(plotdata['Feature1_drugcombo'].iloc[i])
edgetype.append('drug_feat')
# feature 2 drug 1
graphnodes.append(cureffector_drugs[0])
graphsources.append(plotdata['Feature2_med'].iloc[i])
graphedges.append(plotdata['Feature2_drugcombo'].iloc[i])
edgetype.append('drug_feat')
# feature 2 drug 2
graphnodes.append(cureffector_drugs[1])
graphsources.append(plotdata['Feature2_med'].iloc[i])
graphedges.append(plotdata['Feature2_drugcombo'].iloc[i])
edgetype.append('drug_feat')
# feature 1 feature 2
graphnodes.append(plotdata['Feature1'].iloc[i])
graphsources.append(plotdata['Feature2_med'].iloc[i])
graphedges.append(plotdata['FeatureFeatureCorr'].iloc[i])
graphedgesP.append(plotdata['FeatureFeatureCorrP'].iloc[i])
edgetype.append('feat_feat')
# nodes info
nodesnames.append(plotdata['Feature1'].iloc[i])
nodestypes.append('Feature')
nodesnames.append(plotdata['Feature2_med'].iloc[i])
nodestypes.append('Feature')
nodesnames.append(cureffector_drugs[0])
nodestypes.append('Drug')
nodesnames.append(cureffector_drugs[1])
nodestypes.append('Drug')
# make a dataframe of the graph nodes
graph_df = {'Node1': graphnodes,
'Node2': graphsources,
'EdgeValue': graphedges,
'EdgeType': edgetype}
graph_df = pd.DataFrame(graph_df)
graph_df = graph_df.drop_duplicates()
graph_df
# UNCOMMENT TO PRINT GRAPH EDGES TO FILE
#graph_df.to_csv('fig2e_mediation_graph_edges.tsv', sep='\t')
# make a dataframe of node types
nodetype_df = pd.DataFrame({'Node': nodesnames, 'Type': nodestypes})
nodetype_df = nodetype_df.drop_duplicates()
nodetype_df
# UNCOMMENT TO PRINT NODE TYPES TO FILE
#nodetype_df.to_csv('fig2e_mediation_graph_node_types.tsv', sep='\t')
import pkg_resources
import sys
#print package versions
print('Sesssion info:')
print('Python: ', sys.version)
print('numpy: ', pkg_resources.get_distribution('numpy').version)
print('pandas: ', pkg_resources.get_distribution('pandas').version)
print('matplotlib: ', pkg_resources.get_distribution('matplotlib').version)
print('seaborn: ', pkg_resources.get_distribution('seaborn').version)