# Converting the BIGG universal model to the new COBRAPY schema

This is a short illustratin how to convert the BIGG universal model to the COBRAPY schema. Let's start downloading and reading the BIGG universal model JSON.

In [1]:
!wget http://bigg.ucsd.edu/static/namespace/universal_model.json

import json

bigg = json.load(open("universal_model.json", "r"))
str(bigg)[:200]

--2023-03-24 10:36:59--  http://bigg.ucsd.edu/static/namespace/universal_model.json
Resolving bigg.ucsd.edu (bigg.ucsd.edu)... 169.228.33.117
Connecting to bigg.ucsd.edu (bigg.ucsd.edu)|169.228.33.117|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21454113 (20M) [application/json]
Saving to: ‘universal_model.json.4’


2023-03-24 10:37:03 (6.57 MB/s) - ‘universal_model.json.4’ saved [21454113/21454113]



"{'metabolites': [{'id': '4crsol_c', 'name': 'P-Cresol', 'compartment': '', 'notes': {'original_bigg_ids': ['4crsol', '4crsol_c', '_4crsol_c']}, 'annotation': [['KEGG Compound', 'http://identifiers.org"

Here we see that the annotation are nested lists bu those need to become dicts. So let's do the conversion.

In [2]:
from collections import defaultdict

def group_annotation(bigg_annotation : list) -> dict():
    """Group the BIGG annoations into a dictionary."""
    annotations = defaultdict(lambda: list())
    for ann in bigg_annotation:
        try:
            uri = ann[1].split("://identifiers.org/")[1]
            provider, ide = uri.split("/")
        except Exception:
            continue
        annotations[provider].append(ide)
    return dict(annotations)

for obj in ["metabolites", "reactions", "genes"]:
    for species in bigg[obj]:
        species["annotation"] = group_annotation(species["annotation"])

In [3]:
bigg["metabolites"][0]

{'id': '4crsol_c',
 'name': 'P-Cresol',
 'compartment': '',
 'notes': {'original_bigg_ids': ['4crsol', '4crsol_c', '_4crsol_c']},
 'annotation': {'kegg.compound': ['C01468'],
  'chebi': ['CHEBI:11981',
   'CHEBI:17847',
   'CHEBI:1816',
   'CHEBI:20352',
   'CHEBI:44726'],
  'hmdb': ['HMDB01858', 'HMDB13762'],
  'inchikey': ['IWDCLRJOBJJRNH-UHFFFAOYSA-N'],
  'biocyc': ['META:CPD-108'],
  'metanetx.chemical': ['MNXM828'],
  'seed.compound': ['cpd01042']}}

This looks correct now. Let's also fix the compartments.

In [4]:
bigg["compartments"]

{}

Let's have a look what is defined really.

In [5]:
for m in bigg["metabolites"]:
    m["compartment"] = m["id"].split("_")[-1]

compartments = set(m["compartment"] for m in bigg["metabolites"])    
compartments

{'c',
 'cm',
 'cx',
 'e',
 'f',
 'g',
 'h',
 'i',
 'im',
 'l',
 'm',
 'mm',
 'n',
 'p',
 'r',
 's',
 'u',
 'um',
 'v',
 'w',
 'x',
 'y'}

Now we try to annotate them with the default list in COBRAPY.

In [6]:
from cobra.medium.annotations import compartment_shortlist
bigg["compartments"] = {c: compartment_shortlist.get(c, ["unknown"])[0] for c in compartments}
bigg["compartments"]

{'e': 'extracellular',
 'i': 'unknown',
 'y': 'unknown',
 'h': 'chloroplast',
 'n': 'nucleus',
 'c': 'cytoplasm',
 'm': 'mitochondrion',
 'g': 'golgi',
 'r': 'unknown',
 'v': 'vacuole',
 'p': 'periplasm',
 'w': 'cell wall',
 'im': 'mitochondrial intermembrane space',
 's': 'eyespot',
 'u': 'thylakoid',
 'mm': 'mitochondrial membrane',
 'x': 'peroxisome',
 'l': 'lysosome',
 'um': 'unknown',
 'cm': 'unknown',
 'cx': 'unknown',
 'f': 'flagellum'}

Now we save the model to JSON and check whether we can read it well.

In [7]:
json.dump(bigg, open("universal_model_cobrapy.json", "w"))

In [8]:
from cobra.io import load_json_model
model = load_json_model("universal_model_cobrapy.json")

In [9]:
model

0,1
Name,bigg_universal
Memory address,7fd3d8185210
Number of metabolites,15638
Number of reactions,28301
Number of genes,0
Number of groups,0
Objective expression,0.0
Compartments,"cytoplasm, extracellular, periplasm, mitochondrion, peroxisome, unknown, nucleus, vacuole, golgi, thylakoid, lysosome, chloroplast, eyespot, flagellum, mitochondrial intermembrane space, unknown, unknown, unknown, unknown, mitochondrial membrane, cell wall, unknown"


In [10]:
model.metabolites[0].annotation

{'kegg.compound': ['C01468'],
 'chebi': ['CHEBI:11981',
  'CHEBI:17847',
  'CHEBI:1816',
  'CHEBI:20352',
  'CHEBI:44726'],
 'hmdb': ['HMDB01858', 'HMDB13762'],
 'inchikey': ['IWDCLRJOBJJRNH-UHFFFAOYSA-N'],
 'biocyc': ['META:CPD-108'],
 'metanetx.chemical': ['MNXM828'],
 'seed.compound': ['cpd01042']}

In [11]:
model.objective = model.reactions.BIOMASS_reaction
model.optimize()

Unnamed: 0,fluxes,reduced_costs
DM_4crsol_c,-0.000000,0.000000e+00
DM_aacald_c,0.000000,0.000000e+00
DM_amob_c,52.716627,0.000000e+00
BIOMASS_Ec_iJO1366_core_53p95M,0.000000,-5.752206e+00
EX_12ppd__S_e,0.000000,0.000000e+00
...,...,...
EX_LPS30__L_e,-0.000000,0.000000e+00
GTLOA38,0.000000,-7.105427e-15
GTGAL13RMN,0.000000,0.000000e+00
DM_LPS9_46_27_ST_p,-0.000000,0.000000e+00


Looks like everythings is good :)