Mediation is modelled according to the statsmodels.stats.mediation package: Basic mediation analysis with regression model according to [1].
The outcome model: FeatureValue_1 ~ Drug + FeatureValue_2
is tested against mediator model: FeatureValue_2 ~ Drug,
where FeatureValue1 and FeatureValue2 are associated with the Drug treatment.
Mediation analysis is performed in both directions
(FeatureValue1 is tested as mediator and FeatureValue2 is tested as mediator).
The script crreates mediation results dataframe and saves it to file.
[1] Imai, Keele, Tingley (2010). A general approach to causal mediation analysis. Psychological Methods 15:4, 309-334. http://imai.princeton.edu/research/files/BaronKenny.pdf
Required files in the input_data folder:
Supplementary Tables 1 to 4 include information on the cohort characteristics and drug intake with source metadata. Source metadata for the MetaCardis cohort includes disease group, gender, age, study center, body mass index (BMI, kg/m2), alternative healthy eating index (aHEI), diet diversity score (DDS), and dietary approaches to stop hypertension (DASH), physical activity and manual work levels, and smoking status. Source data includes information on the drug intake, drug combinations, dosage and antibiotic use analyzed in the MetaCardis cohort.
Supplementary Table 8: Features of microbiome, host and metabolome impacted by different drug combinations. Analysis of the effect of drug combinations, assessed for impact on host and microbiome falling within different measurement categories in each patient group.
File with source data on host and microbiome feature values per sample.
# load necessary libraries
import pandas as pd
import numpy as np
from scipy import stats
import statsmodels.api as sm
from statsmodels.stats.mediation import Mediation
from IPython.display import display
# prepare file names to download: phenotype, drug and feature information
fileFolder = './input_data/'
featureFile = 'example_feature_table.csv'
hubfileDrugDosage = 'Supplementary_Tables_1_to_4_2019-09-13434.xlsx'
sheetDrug = 'ST 2b'
# supplementary table 8 to filter features affected by drug combinations
fileName = 'Supplementary_Table_8_2019-09-13434.xlsx'
# read file names with features
features_df = pd.read_csv(fileFolder + featureFile, sep=',', index_col=0)
featuresDrug = pd.read_excel(fileFolder + hubfileDrugDosage,
sheet_name = sheetDrug)
# read drug combination features files
sheetName = 'Data'
drugCombinationEffect = pd.read_excel(fileFolder + fileName,
sheet_name = sheetName)
# Read patient group information from Supplementary Table 1
hubfilePhenotype = 'Supplementary_Tables_1_to_4_2019-09-13434.xlsx'
sheetName = 'ST 1b'
indication_df = pd.read_excel(fileFolder + hubfilePhenotype,
sheet_name = sheetName)
# create dataframe with sample - group
featuresGroup = indication_df[['SampleID', 'PatientGroup']]
featuresGroup = featuresGroup.drop_duplicates()
featuresGroup = featuresGroup.set_index('SampleID', drop=True)
##################################################################
# keep only samples for which features are avaiable
common_samples = list(set(featuresGroup.index).intersection(features_df.index))
featuresGroup = featuresGroup.loc[common_samples,:]
#######################################
# select drug combinations and condition for which to perform analysis
drug_pairs = [['STATINE_C', 'METFORMIN_C'],
['STATINE_C', 'ASA_C'],
['STATINE_C', 'CA2_CBL_C']]
drug_pairs_names = [['Statin', 'Metformin'],
['Statin', 'Aspirin'],
['Statin', 'Calcium antagonist']]
cursampleset = '3' #T2D group
# perform mediation analysis for each drug pair
feat_names1 = []
feat_names2 = []
featcorr_DrugFeat = []
featcorrP_DrugFeat = []
featcorr_1_2 = []
featcorrP_1_2 = []
feat_medres = []
feat_drugcombo = []
for drug_i in range(len(drug_pairs)):
drug_name1 = drug_pairs[drug_i][0]
drug_name2 = drug_pairs[drug_i][1]
# extract dataframe for current drug
drug_df = featuresDrug[[item for item in featuresDrug.columns
if (drug_name1 in item) |
(drug_name2 in item)]].copy()
drug_df.index = featuresDrug['SampleID']
# column Drug contains combination
drug_df['Drug'] = drug_df.iloc[:,0] * drug_df.iloc[:,1]
# overlap with phenotype
drug_df = drug_df.loc[featuresGroup.index]
drug_df['Group'] = featuresGroup.loc[drug_df.index,'PatientGroup']
#############################
# get feature names that are changing in this combination
associated_features = drugCombinationEffect[(drugCombinationEffect['Effector'].str.find(drug_pairs_names[drug_i][0])>=0) &
(drugCombinationEffect['Effector'].str.find(drug_pairs_names[drug_i][1])>=0) &
(drugCombinationEffect['Sample set'].str.find(cursampleset)>=0)].copy()
selected_features = list(set(associated_features['Feature display name']).intersection(features_df.columns))
#############################
print('Running mediation analysis for combination ', drug_name1, ' and ', drug_name2, ' in ', cursampleset)
for i in range(len(selected_features)):
feat1name = selected_features[i]
for j in range(i+1, len(selected_features)):
feat2name = selected_features[j]
print('Performing mediation analysis for ', feat1name, ' and ', feat2name)
# add features to drug df
drugmediation_df = pd.concat([drug_df, features_df.loc[:,[feat1name,feat2name]]], axis=1)
# select only patients from one group
testdata = drugmediation_df[drugmediation_df['Group']==cursampleset].copy()
# reaplce nans with 0
testdata = testdata.fillna(0)
# doublecheck that feature values are numeric
testdata[feat1name] = pd.to_numeric(testdata[feat1name])
testdata[feat2name] = pd.to_numeric(testdata[feat2name])
#rename columns to generic for easier formula definition for mediation analysis
testdata.columns = [item.replace(feat1name, 'FeatureValue_1') for item in testdata.columns]
testdata.columns = [item.replace(feat2name, 'FeatureValue_2') for item in testdata.columns]
# change drug from boolean to 0 and 1
testdata['Drug'] = [1 if testdata['Drug'].iloc[i]==True else 0 for i in range(len(testdata))]
# model feature 1 as mediator
outcome_model = sm.OLS.from_formula("FeatureValue_1 ~ Drug + FeatureValue_2",
data = testdata)
mediator_model = sm.OLS.from_formula("FeatureValue_2 ~ Drug",
data = testdata)
med = Mediation(outcome_model, mediator_model, "Drug", "FeatureValue_2")
med_result = med.fit(n_rep = 100)
res = med_result.summary()
# save results
feat_names1.append(feat1name)
feat_names2.append(feat2name)
feat_medres.append(res)
#model feature 2 as mediator
outcome_model = sm.OLS.from_formula("FeatureValue_2 ~ Drug + FeatureValue_1",
data = testdata)
mediator_model = sm.OLS.from_formula("FeatureValue_1 ~ Drug",
data = testdata)
med = Mediation(outcome_model, mediator_model, "Drug", "FeatureValue_1")
med_result = med.fit(n_rep = 100)
res = med_result.summary()
# save results
feat_names1.append(feat2name)
feat_names2.append(feat1name)
feat_medres.append(res)
# calculate correlations
res = stats.spearmanr(testdata['Drug'], testdata['FeatureValue_1'])
featcorr_DrugFeat.append(res[0])
featcorrP_DrugFeat.append(res[1])
res = stats.spearmanr(testdata['Drug'], testdata['FeatureValue_2'])
featcorr_DrugFeat.append(res[0])
featcorrP_DrugFeat.append(res[1])
res = stats.spearmanr(testdata['FeatureValue_1'], testdata['FeatureValue_2'])
featcorr_1_2.append(res[0])
featcorrP_1_2.append(res[1])
res = stats.spearmanr(testdata['FeatureValue_2'], testdata['FeatureValue_1'])
featcorr_1_2.append(res[0])
featcorrP_1_2.append(res[1])
feat_drugcombo.append('Combination: ' + drug_name1 + ', ' + drug_name2)
feat_drugcombo.append('Combination: ' + drug_name1 + ', ' + drug_name2)
# compile dataframe with mediation results
medres_df = []
for i in range(len(feat_medres)):
df = feat_medres[i].copy()
df = pd.melt(df.assign(index=df.index), id_vars=['index'])
df = pd.DataFrame(np.reshape(df['value'].values,(1,np.shape(df)[0])),
columns=df['index']+'_'+df['variable'])
if i==0:
medres_df = df
else:
medres_df = pd.concat([medres_df, df])
medres_df['Feature1'] = feat_names1
medres_df['Feature2_med'] = feat_names2
medres_df['FeatureDrugCorr'] = featcorr_DrugFeat
medres_df['FeatureDrugCorrP'] = featcorrP_DrugFeat
medres_df['FeatureFeatureCorr'] = featcorr_1_2
medres_df['FeatureFeatureCorrP'] = featcorrP_1_2
medres_df['Effector'] = feat_drugcombo
medres_df['Sample set'] = cursampleset
###############################
# select a subset of columns to print to file
medres_df = medres_df[['Effector', # drug combination pair
'Sample set', # patient group
'Feature1', 'Feature2_med', # Feature (1) and mediator (2)
'ACME (average)_Estimate', # Average estimate of mediated effect
'ADE (average)_Estimate', # Average estimate of direct drug effect
'Total effect_Estimate', # Estimate of total effect
'ACME (average)_P-value', # P-value of mediated effect
'ADE (average)_P-value', # P-value of direct drug effect
'Total effect_P-value', # P-value of total effect
'FeatureDrugCorr', 'FeatureDrugCorrP', # Pearson corr and p-value between drug and Feature 1
'FeatureFeatureCorr', 'FeatureFeatureCorrP']] #Pearson corr and p-value between Feature 1 and Feature 2
# display mediation analysis results
display(medres_df[medres_df['ACME (average)_P-value']<=0.1])
# Uncomment to print to file
medres_df.to_csv(fileFolder + 'mediation_results_drug_combination.csv',
index=0)
import pkg_resources
import sys
#print package versions
print('Sesssion info:')
print('Python: ', sys.version)
print('numpy: ', pkg_resources.get_distribution('numpy').version)
print('pandas: ', pkg_resources.get_distribution('pandas').version)
print('scipy: ', pkg_resources.get_distribution('scipy').version)
print('statsmodels: ', pkg_resources.get_distribution('statsmodels').version)