Brand

MDLink

extrapolation of potential Metabolite-Disease Associations by Mining biomedical knowledge

  • Home
  • Search
  • Tutorial
  • Feedback

Welcome to MDLink!

As metabolomics becomes increasingly prevalent, a growing number of metabolite biomarkers are being identified in various diseases. However, the roles these metabolites play in diseases and their mechanisms of action are still subjects of ongoing exploration. MDLink is designed to deduce potential roles of under-researched metabolites linked to various diseases based on prior knowledge from different aspects, which eventually paves the way for in-depth future research.

Pipeline Overview

pipeline

Citation

MDLink is openly accessible to all users and does not require registration.

If MDLink is utilized in your research, kindly cite our publication:

Huimin Zheng, et al. (2022). In silico method to maximise the biological potential of understudied metabolomic biomarkers: a study in pre-eclampsia.
Step 0: select the analysis module
Analysis of specific metabolite

Analysis of a specific metabolite

Users can input the name or ID of a metabolite of interest to explore its potential functions and biological associations with target diseases.
Batch analysis of multiple metabolites

Batch analysis of multiple metabolites

Users can input a list of metabolites to explore potential biological associations with a target disease. This list of metabolites can be significantly abundant metabolites identified from the biomarker discovery module.
Step 1: input the metabolite of interest

Step 2: input the target disease
Note: please select one if multiple relevant diseases are displayed.

The user opted to skip disease specification in Step 2. As a result, the Disease-related Genes/Proteins Over-Representation Analysis (ORA) will not be conducted.

Step 3: set parameters for the three branches

Define the branch





Define the target proteins



Interaction Network


Define the structurally-similar metabolites



Interaction Network


Define the co-abundant metabolites


Note: This section requires entering the results of the weighted correlation network analysis (WGCNA), including metabolites (the first column) and modules (the second column). Then the co-abundant metabolites (i.e., consistency abundance module) are automatically matched according to the metabolite of interest you input.

Interaction Network


Define metabolite-related proteins/genes


Step 4: Select Databases for Enrichment Analysis

Database Selection




Note: Both TERM2GENE and TERM2NAME files should be tab-separated.
TERM2GENE format: Term ID <tab> Gene ID
TERM2NAME format: Term ID <tab> Term Name
The Term IDs must match between the two files.

Note: Both TERM2GENE and TERM2NAME files should be tab-separated.
TERM2GENE format: Term ID <tab> Gene ID
TERM2NAME format: Term ID <tab> Term Name
The Term IDs must match between the two files.

Note: Both TERM2GENE and TERM2NAME files should be tab-separated.
TERM2GENE format: Term ID <tab> Gene ID
TERM2NAME format: Term ID <tab> Term Name
The Term IDs must match between the two files.

Note: Both TERM2GENE and TERM2NAME files should be tab-separated.
TERM2GENE format: Term ID <tab> Gene ID
TERM2NAME format: Term ID <tab> Term Name
The Term IDs must match between the two files.

Note: Custom databases should be tab-separated with two columns:
Column 1: Term ID
Column 2: Term Description
Step 0: select the analysis module
Analysis of specific metabolite

Analysis of a specific metabolite

Users can enter a metabolite name or ID to explore the potential function of a metabolite of interest and its potential biological links with target diseases.
Batch analysis of multiple metabolites

Batch analysis of multiple metabolites

Users can input a list of metabolites to explore potential biological associations with a target disease. This list of metabolites can be significantly abundant metabolites identified from the biomarker discovery module.
Step 1: input multiple metabolites
Note: Multiple mixed naming and identification systems for metabolites are allowed.
Prior to analysis, balance potential confounders across the two groups to enhance inter-group comparability.

Upload abundance table and metadata


  • Upload →

Normalization


  • ← Previous
  • Proceed →

Set parameters for differential abundance analysis


  • ← Previous
  • Run →

Set thresholds to define metabolite biomarkers


  • ← Previous
  • Filter →
Step 2: input the target disease
Note: please select one if multiple relevant diseases are displayed.

The user opted to skip disease specification in Step 2. As a result, the Disease-related Genes/Proteins Over-Representation Analysis (ORA) will not be conducted.

Step 3: set parameters for three branches

Define the target proteins


Define the structurally-similar metabolites



Define the co-abundant metabolites


Note: The abundance table for WGCNA should have samples in rows and metabolites in columns.
Note: This section requires entering the results of the weighted correlation network analysis (WGCNA), including metabolites (the first column) and modules (the second column). Then the co-abundant metabolites (i.e., consistency abundance module) are automatically matched according to the biomarker you inputted.

Interaction Network


Table of Contents

  • 1. About the MDLink
  • 2. About the Search Page
    • 2.1 Analysis of a specific metabolite
      • Step 0: Select Module
      • Step 1: Input Metabolite
      • Step 2: Input Target Disease
      • Step 3: Set Parameters
        • 3.1 The target proteins branch
        • 3.2 The structurally-similar metabolites branch
        • 3.3 The co-abundant metabolites branch
        • 3.4 The user defined branch
    • 2.2 Batch Analysis of Multiple Metabolites
      • Step 0: Select Module
      • Step 1: Input Metabolites/Upload Data
      • Step 2: Input Target Disease
      • Step 3: Set parameters for three branches

1. About the MDLink


MDLink is designed to explore the biological relevance of understudied metabolites and their potential links with diseases, enabling evidence-based prioritization of metabolic biomarkers for further investigation. MDLink consists of three branches, each grounded in a specific biological assumption, to predict potential interaction proteins/genes for metabolites of interest, identify involved pathways and infer potential disease associations linked to these metabolites.


HomePage

The default database integrates multiple data sources across four main categories:

  • Interaction Networks: STRING and STITCH for protein-protein and metabolite-protein interactions.
  • Protein and Metabolite Information: STRING, STITCH, and PubChem for comprehensive annotations of proteins and metabolites.
  • Disease-Related Genes: Disease Ontology (DO), DisGeNET, and NCG.
  • Biological Annotations: Gene Ontology (including Molecular Function, Cellular Component, and Biological Process), KEGG, WikiPathways, and Reactome pathways.

The database encompasses approximately xxxx metabolites, 21,300 proteins, xxx protein-protein interactions, xxx metabolite-protein interactions, xxx diseases, xxx GO terms, xxx KEGG pathways, xxx WikiPathways, and xxx Reactome pathways.

2. About the Search Page


On the Search page, users have the option to analyze a single metabolite or perform batch analysis of multiple metabolites. Metabolites of interest can be explored through individual or combined analyses across three branches: the target proteins branch, the structurally similar metabolites branch, and the co-abundant metabolites branch. The results for each selected branch will be displayed and summarized separately.

Upon completion of each step, corresponding summaries and results are automatically generated, presented as downloadable figures and interactive tables. Figures include scaling controls in the lower-right corner, while tables support horizontal scrolling, column-specific sorting, and keyword search functionality.

Icon Functions:
: provide explanatory notes.
#1 #2 #3 : supply example datasets.
: enable data download.

2.1 Analysis of a specific metabolite

Step 0 select module

In the step 0, select Analysis of a specific metabolite module to start analyses.


SingleSearchPage
Step 1 input metabolite

To begin analyzing a specific metabolite, users can search using either a metabolite name or ID. All synonyms and identifiers are supported, including the common names and multiple identification systems for metabolites (e.g., KEGG ID, HMDB ID, CAS numbers, ChEBI IDs, and all aliases from PubChem). For example, the following terms are all allowed as input to search for arachidonic acid:

Common name:
arachidonic acid
PubChem CID:
CID444899
KEGG compound ID:
C00219
ChEBI:
CHEBI:137828
HMDB:
HMDB0001043
CAS-RN:
506-32-1

nputMetabolite

After inputting the metabolite, a brief summary of query result for this metabolite will be shown (take arachidonic acid as an example).


ResuleOfMetabolite
Step 2 input target disease

The input of a target disease is used to obtain proteins associated with that specific disease. Then, an over-representation analysis (ORA) will be performed to determine whether the resulting disease-related proteins are statistically over-represented among those predicted to interact with the queried metabolite. By selecting a disease from the list (using the "radio button", e.g.,  DOID:8778), the corresponding disease-related proteins will be automatically passed to the ORA analysis.


QueryDiseaseGenes

Or users can manually input a list of disease-related proteins in the second option with protein names or NCBI Entrez Gene IDs


ManuallyInputDiseaseGenes

Alternatively, if there is no specific disease to be studied, users can opt to skip this section using the third option, in which case disease-related proteins ORA will not be performed in the later analyses.


SkipDiseaseGenes
Step 3 set parameters for the analytic branches

3.1 The target proteins branch

By selecting the first analytic module in Step 3, users can use the target proteins branch.

Within this module, users can either manually input target proteins associated with the query metabolite (no confidence score required) or retrieve target proteins from the default database, in which case a minimum confidence score must be set (in the "Define the Target Proteins" panel). The resulting target proteins will be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.


TgtpSetParams

The matched target proteins, their potential interacting genes/proteins, and the associated interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.


TgtpNetworks

The results of the disease-related proteins ORA will be displayed (using Crohn's diseases and arachidonic acid as an example). The Venn diagram shows the overlap between the predicted metabolite-related proteins and the disease-related proteins. Download options for both the figure and table are provided.


TgtpDisgenesORA

The predicted interaction proteins/genes will be used to perform term enrichment analysis in Step 4, Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files (examples provided in #1 ). These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.


TgtpEnrichment

The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon next to the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The Sankey plot on the left shows connections between metabolite-targeted genes, interaction genes/proteins, and selected terms. The dot plot on the right illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.


TgtpEnrichment

Users can also use custom databases separately for enrichment analysis.


TgtpEnrichment
3.2 The structurally-similar metabolites branch

By selecting the second analytic module in Step 3, users can use the structurally similar metabolites branch.

Within this module, users can either manually input structurally similar metabolites associated with the query metabolite or retrieve structurally similar metabolites (with a Tanimoto score >= 90%) from the default database. The resulting structurally similar metabolites will be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.


SsimmSetParams

The matched structurally similar metabolites, their potential interacting genes/proteins, and the associated interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.


SsimmNetworks

The results of the disease-related proteins ORA will be displayed (using Crohn's diseases and arachidonic acid as an example). The Venn diagram shows the overlap between the predicted structurally similar metabolite-related proteins and the disease-related proteins. Download options for both the figure and table are provided.


SsimmDisgenesORA

The predicted interaction proteins/genes can be used to perform term enrichment analysis in Step 4. Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files. These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.


TgtpEnrichment

The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon to tick the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The left-sided Sankey plot shows connections between structurally similar metabolites, interaction genes/proteins, and selected terms. The right-sided dot plot illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.


SsimmEnrichment
3.3 The co-abundant metabolites branch

By selecting the third analytic module in Step 3, users can use the co-abundant metabolites branch.

Within this module, users have three options to obtain metabolites that co-vary with the queried metabolite: (i) upload a metabolic abundance table to identify the co-abundant metabolites associated with queried metabolite using Weighted Gene Co-expression Network analysis (WGCNA), (ii) directly upload precomputed WGCNA results containing metabolites and modules, or (iii) manually input metabolites that co-vary with the queried metabolite. The resulting co-abundant metabolites, including the queried metabolite, will be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.


CoabmSetParams

When identifying abundance-correlated metabolites (i.e., consistent abundance module) using WGCNA, it is important to ensure the correct input file format and to properly set the parameters required for the analysis. The input file should be in TXT or TSV format, with metabolites as columns and samples as rows. An example of abundance table is provided for referring to the format.


CoabmWGCNAData

The WGCNA process can be divided into 5 steps as outlined in WGCNA tutorial WGCNA tutorial : (a) choosing the soft-thresholding power; (b) calculating co-expression similarity and adjacency; (c) calculating topological overlap matrix (TOM) ; (d) clustering using TOM; (e) merging of modules whose expression profiles are very similar. Several important parameters are made available to users for customization. These include selecting the network type (signed, unsigned, or signed hybrid) in step a and b, choosing the correlation method (Spearman or Pearson) in step b, setting the minimum module size (default: 5) in step d, and defining the module merging threshold (i.e., 1-TOM dissimilarity, default TOM dissimilarity cutoff: 0.25) in step e.


CoabmWGCNAParams

The results of WGCNA will be presented on a table.


CoabmWGCNA

The identified co-abundant metabolites, their potential interacting genes/proteins, and the associated interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.


CoabmNetworks

The results of the disease-related proteins ORA will be displayed (using Crohn's diseases and arachidonic acid as an example). The Venn diagram shows the overlap between the predicted co-abundant metabolite-related proteins and the disease-related proteins. Download options for both the figure and table are provided.


CoabmDisgenesORA

The predicted interaction proteins/genes can be used to perform term enrichment analysis in Step 4. Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files. These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.


TgtpEnrichment

The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon next to the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The left-sided Sankey plot shows connections between co-abundant metabolites, interaction genes/proteins, and selected terms. The right-sided dot plot illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.


CoabmEnrichment
3.4 The user defined branch

By selecting the fourth analytic module in Step 3, users can use the user defined branch.

This feature allows users to manually input gene names linked to metabolites. The entered genes will be incorporated as independent entities into subsequent network construction and pathway enrichment analyses.


UserSetParams

The user defined metabolite-related proteins/genes and their interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.


UserNetworks

The results of the disease-related proteins ORA will be displayed. In this example, a warning was generated because there was no overlap between the user-defined genes/proteins and the disease-related proteins. Additionally, if disease specification was skipped in Step 2, the ORA could not be conducted, resulting in the same warning.


UserDisgenesORA

The matched proteins/genes can be used to perform term enrichment analysis in Step 4. Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files. These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.


TgtpEnrichment

The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon next to the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The left-sided Sankey plot shows connections between co-abundant metabolites, interaction genes/proteins, and selected terms. The right-sided dot plot illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.


UserEnrichment

2.2 Batch analysis of multiple metabolites

Step 0 select module

In step 0, select Batch analysis of multiple metabolites module to start analyses.


batchSearchPage
Step 1 input metabolites/upload data

To perform analyses on multiple metabolites, the first step involves inputting the relevant metabolite information. We provide two options for data upload:

(i) input a list of metabolites for downstream analysis, multiple mixed naming and identification systems for metabolites are allowed;


InputMetabolitesList

(ii) alternatively, use significantly differential abundant metabolites identified by the Biomarker discovery module as input for downstream analysis.

To define the metabolite biomarkers, a metabolic abundance table and a corresponding metadata file containing sample grouping information are required. The metabolite abundance table supports multiple naming conventions and identification systems, which can be used concurrently. As a reference, an example dataset including metabolomic profiles and metadata from individuals with Crohn's disease and non-IBD controls (PMID: 30531976) is available for download.

Users can upload their own files by selecting the “Browse” button to locate the desired files and clicking “Upload” to initiate the upload process.


UploadData

The uploaded data will undergo transformation and scaling. Choose the appropriate methods, and then click “Proceed” to continue.


NormData

Next, choose the grouping information from the uploaded metadata and set parameters for differential abundance analysis, then click “Run” to move forward.


DAA

Now, set the statistic thresholds to identify significantly differential abundant metabolite, then click “Filter” to continue.


FindBiomarkers

Statistical results for all metabolites will be presented in a table and are available for download. Only metabolites identified as significantly differential abundant will be carried forward into subsequent analysis. Comprehensive summaries will be provided to describe the analytical methodology, including data structure, preprocessing techniques, and the statistical criteria used to identify differentially abundant metabolites.


ResulteOfBiomarkers
Step 2 input target disease

The input of a target disease is used to obtain proteins associated with that specific disease. Then, over-representation analysis (ORA) will be performed to determine whether the resulting disease-related proteins are statistically over-represented among those predicted to interact with the query metabolite. By selecting a diseases from the list (using the "radio button", e.g.,  DOID:8778), the corresponding disease-related proteins will be automatically loaded to the ORA analyses.


QueryDiseaseGenes

Or users can manually input a list of disease-related proteins in the second option with protein names or NCBI Entrez Gene IDs.


ManuallyInputDiseaseGenes

Alternatively, if there is no specific disease to be studied, users can opt to skip this section using the third option, in which case disease-related proteins ORA will not be performed in the later analyses.


SkipDiseaseGenes
Step 3: set parameters for three branches

The analysis scope depends on the input selection made in Step 1:

  • If the input is a list of metabolite names or IDs, the server will conduct analyses on all metabolites;
  • If the input is derived from the Biomarker Discovery module, only those metabolites identified as significantly differential abundant are subjected to further analysis.
  • Set the parameters for the three analytic branches:

  • The target proteins branch: Target proteins will be retrieved from the default database using a user-defined minimum confidence score set by users.
  • The structurally similar metabolites branch: Metabolites with a Tanimoto score >= 90% in the default database are considered as structural similar metabolites.
  • The co-abundant metabolites branch: This branch provides three options to obtain metabolites that co-vary with the queried metabolite or skip this step: (i) upload a metabolic abundance table to identify the co-abundant metabolites using Weighted Gene Co-expression Network analysis (WGCNA), (ii) directly upload precomputed WGCNA results containing metabolites and modules, or (iii) skip this branch by selecting the “no co-abundant metabolites” option. To learn more about the WGCNA settings, check out section 3.3 The co-abundant metabolites branch in the Analysis of a specific metabolite module.
  • The resulting target proteins, structurally similar metabolites and co-abundant metabolites will each be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.
    Note: This step take time to complete.


    BatchSetParams

    If the co-abundant metabolites branch is not skipped, the results of WGCNA analysis will be presented in a tabular format.


    BatchWGCNA

    When choose to use significantly differential abundant metabolites identified by the Biomarker discovery module as input in step1, a circular plot summarizing key features of these biomarkers will be displayed. The layers of the plot, from inner to outer, are described as follows:

  • layers 1 to 3 (blue/green/red dots) represent the predicted interaction proteins of the biomarkers are significantly enriched within CD-related genes/proteins in the target proteins branch, the structural similar metabolites branch, and the co- abundant metabolites branch, respectively;
  • layers 4 to 6 (heatmap) represent the statistical metrics of the biomarkers, which are Area Under the Curve (AUC), absolute value of log2 fold change, and -log10(P-value);
  • the outer layer represents the Variable Importance in Projection (VIP) score of each biomarker.

  • ResultOfBatchMeDAM

    A summary of disease-related gene statistics for each metabolite, based on ORA results from the three analytic branches, will be shown in a table. To view detailed results and perform further analysis for a specific metabolite, click the circle icon next to the corresponding metabolite name.


    BatchSelMetaboliteSetParams

    Users can search for a specific metabolite using the global search box positioned at the top-right of the interface. Upon locating the desired metabolite (e.g., urobilin), clicking the radio button next to its name will enable further analysis.


    BatchSelMetaboliteSetParams

    The target proteins branch for urobilin will show a warning : “The potential interaction genes/proteins were not found”. This indicates that no target proteins for urobilin are present in the default database.

    BatchSelMetaboliteTgtp

    The results of the structural similar metabolites branch will be presented, with details as previously described (see "Analysis of a specific metabolite" module).


    BatchSelMetaboliteSsimm

    Similarly, the results of the co-abundant metabolites branch will be presented, with details as previously described (see "Analysis of a specific metabolite" module).


    BatchSelMetaboliteCoabm
    end

    Thank you for using MDLink!


    We appreciate your feedback on our database and webserver. If you have any issues or suggestions, please feel free to contact us.

    We are committed to addressing your concerns and considering your ideas for MDLink's future updates. Your support and interest are greatly appreciated!


    Address:
    Central Laboratory of the Medical Research Center, The First Affiliated Hospital of Ningbo University, Ningbo 315000, China.
    Lab:
    1. Ningbo Key Laboratory of Human Microbiome and Precision Medicine
    2. YuLab-SMU
    Contacts:
    Wenli Tang, fyytangwenli@nbu.edu.cn
    Wenjie Zhu, zhuwenjie@qdu.edu.cn
    Dr. Huimin Zheng, zhenghuimin91@126.com
    Li Zhan, smu18575877413@gmail.com
    Prof. Guangchuang Yu, gcyu1@smu.edu.cn

    Comment and Feedback


    Copyright ©2025, Ningbo Key Laboratory of Human Microbiome and Precision Medicine, The First Affiliated Hospital of Ningbo University. All Rights Reserved.
    苏ICP备2025193867号