As metabolomics becomes increasingly prevalent, a growing number of metabolite biomarkers are being identified in various diseases. However, the roles these metabolites play in diseases and their mechanisms of action are still subjects of ongoing exploration. MDLink is designed to deduce potential roles of under-researched metabolites linked to various diseases based on prior knowledge from different aspects, which eventually paves the way for in-depth future research.
If MDLink is utilized in your research, kindly cite our publication:
The user opted to skip disease specification in Step 2. As a result, the Disease-related Genes/Proteins Over-Representation Analysis (ORA) will not be conducted.
The user opted to skip disease specification in Step 2. As a result, the Disease-related Genes/Proteins Over-Representation Analysis (ORA) will not be conducted.
MDLink is designed to explore the biological relevance of understudied metabolites and their potential links with diseases, enabling evidence-based prioritization of metabolic biomarkers for further investigation. MDLink consists of three branches, each grounded in a specific biological assumption, to predict potential interaction proteins/genes for metabolites of interest, identify involved pathways and infer potential disease associations linked to these metabolites.
The default database integrates multiple data sources across four main categories:
The database encompasses approximately xxxx metabolites, 21,300 proteins, xxx protein-protein interactions, xxx metabolite-protein interactions, xxx diseases, xxx GO terms, xxx KEGG pathways, xxx WikiPathways, and xxx Reactome pathways.
On the Search page, users have the option to analyze a single metabolite or perform batch analysis of multiple metabolites. Metabolites of interest can be explored through individual or combined analyses across three branches: the target proteins branch, the structurally similar metabolites branch, and the co-abundant metabolites branch. The results for each selected branch will be displayed and summarized separately.
Upon completion of each step, corresponding summaries and results are automatically generated, presented as downloadable figures and interactive tables. Figures include scaling controls in the lower-right corner, while tables support horizontal scrolling, column-specific sorting, and keyword search functionality.
Icon Functions:
: provide explanatory notes.
#1 #2 #3 : supply example datasets.
: enable data download.
In the step 0, select Analysis of a specific metabolite module to start analyses.
To begin analyzing a specific metabolite, users can search using either a metabolite name or ID. All synonyms and identifiers are supported, including the common names and multiple identification systems for metabolites (e.g., KEGG ID, HMDB ID, CAS numbers, ChEBI IDs, and all aliases from PubChem). For example, the following terms are all allowed as input to search for arachidonic acid:
After inputting the metabolite, a brief summary of query result for this metabolite will be shown (take arachidonic acid as an example).
The input of a target disease is used to obtain proteins associated with that specific disease. Then, an over-representation analysis (ORA) will be performed to determine whether the resulting disease-related proteins are statistically over-represented among those predicted to interact with the queried metabolite. By selecting a disease from the list (using the "radio button", e.g., DOID:8778), the corresponding disease-related proteins will be automatically passed to the ORA analysis.
Or users can manually input a list of disease-related proteins in the second option with protein names or NCBI Entrez Gene IDs
Alternatively, if there is no specific disease to be studied, users can opt to skip this section using the third option, in which case disease-related proteins ORA will not be performed in the later analyses.
By selecting the first analytic module in Step 3, users can use the target proteins branch.
Within this module, users can either manually input target proteins associated with the query metabolite (no confidence score required) or retrieve target proteins from the default database, in which case a minimum confidence score must be set (in the "Define the Target Proteins" panel). The resulting target proteins will be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.
The matched target proteins, their potential interacting genes/proteins, and the associated interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.
The results of the disease-related proteins ORA will be displayed (using Crohn's diseases and arachidonic acid as an example). The Venn diagram shows the overlap between the predicted metabolite-related proteins and the disease-related proteins. Download options for both the figure and table are provided.
The predicted interaction proteins/genes will be used to perform term enrichment analysis in Step 4, Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files (examples provided in #1 ). These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.
The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon next to the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The Sankey plot on the left shows connections between metabolite-targeted genes, interaction genes/proteins, and selected terms. The dot plot on the right illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.
Users can also use custom databases separately for enrichment analysis.
By selecting the second analytic module in Step 3, users can use the structurally similar metabolites branch.
Within this module, users can either manually input structurally similar metabolites associated with the query metabolite or retrieve structurally similar metabolites (with a Tanimoto score >= 90%) from the default database. The resulting structurally similar metabolites will be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.
The matched structurally similar metabolites, their potential interacting genes/proteins, and the associated interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.
The results of the disease-related proteins ORA will be displayed (using Crohn's diseases and arachidonic acid as an example). The Venn diagram shows the overlap between the predicted structurally similar metabolite-related proteins and the disease-related proteins. Download options for both the figure and table are provided.
The predicted interaction proteins/genes can be used to perform term enrichment analysis in Step 4. Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files. These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.
The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon to tick the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The left-sided Sankey plot shows connections between structurally similar metabolites, interaction genes/proteins, and selected terms. The right-sided dot plot illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.
By selecting the third analytic module in Step 3, users can use the co-abundant metabolites branch.
Within this module, users have three options to obtain metabolites that co-vary with the queried metabolite: (i) upload a metabolic abundance table to identify the co-abundant metabolites associated with queried metabolite using Weighted Gene Co-expression Network analysis (WGCNA), (ii) directly upload precomputed WGCNA results containing metabolites and modules, or (iii) manually input metabolites that co-vary with the queried metabolite. The resulting co-abundant metabolites, including the queried metabolite, will be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.
When identifying abundance-correlated metabolites (i.e., consistent abundance module) using WGCNA, it is important to ensure the correct input file format and to properly set the parameters required for the analysis. The input file should be in TXT or TSV format, with metabolites as columns and samples as rows. An example of abundance table is provided for referring to the format.
The WGCNA process can be divided into 5 steps as outlined in WGCNA tutorial WGCNA tutorial : (a) choosing the soft-thresholding power; (b) calculating co-expression similarity and adjacency; (c) calculating topological overlap matrix (TOM) ; (d) clustering using TOM; (e) merging of modules whose expression profiles are very similar. Several important parameters are made available to users for customization. These include selecting the network type (signed, unsigned, or signed hybrid) in step a and b, choosing the correlation method (Spearman or Pearson) in step b, setting the minimum module size (default: 5) in step d, and defining the module merging threshold (i.e., 1-TOM dissimilarity, default TOM dissimilarity cutoff: 0.25) in step e.
The results of WGCNA will be presented on a table.
The identified co-abundant metabolites, their potential interacting genes/proteins, and the associated interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.
The results of the disease-related proteins ORA will be displayed (using Crohn's diseases and arachidonic acid as an example). The Venn diagram shows the overlap between the predicted co-abundant metabolite-related proteins and the disease-related proteins. Download options for both the figure and table are provided.
The predicted interaction proteins/genes can be used to perform term enrichment analysis in Step 4. Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files. These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.
The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon next to the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The left-sided Sankey plot shows connections between co-abundant metabolites, interaction genes/proteins, and selected terms. The right-sided dot plot illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.
By selecting the fourth analytic module in Step 3, users can use the user defined branch.
This feature allows users to manually input gene names linked to metabolites. The entered genes will be incorporated as independent entities into subsequent network construction and pathway enrichment analyses.
The user defined metabolite-related proteins/genes and their interaction relationships will be presented in tables. Furthermore, the interaction data is available for download in edges-and-nodes format, enabling visualization in network analysis tools such as Cytoscape.
The results of the disease-related proteins ORA will be displayed. In this example, a warning was generated because there was no overlap between the user-defined genes/proteins and the disease-related proteins. Additionally, if disease specification was skipped in Step 2, the ORA could not be conducted, resulting in the same warning.
The matched proteins/genes can be used to perform term enrichment analysis in Step 4. Users can select standard databases (e.g., Gene Ontology, KEGG Pathways, WikiPathways, Reactome Pathways, Disease Ontology) for enrichment analysis. If the provided databases do not meet specific needs, users have the option to upload custom databases by submitting tab-separated TERM2GENE and TERM2NAME files. These files must adhere to specific formats: TERM2GENE requires Term IDs paired with Gene IDs, while TERM2NAME maps Term IDs to descriptive names.
The results of the enrichment analysis will be displayed in a table with a global search box at the top-right and filters under the header for domain-specific browsing. These tools enable users to focus on specific terms and view statistical details. Clicking the square icon next to the term name can choose terms of interest for visualization. It will be presented as a composite diagram comprising a Sankey plot and a dot plot. The left-sided Sankey plot shows connections between co-abundant metabolites, interaction genes/proteins, and selected terms. The right-sided dot plot illustrates the number and proportion of predicted genes/proteins involved with the enriched terms, along with the adjusted p-values indicating statistical significance. The plot can be zoomed in/out using the controls in the lower-right corner.
In step 0, select Batch analysis of multiple metabolites module to start analyses.
To perform analyses on multiple metabolites, the first step involves inputting the relevant metabolite information. We provide two options for data upload:
(i) input a list of metabolites for downstream analysis, multiple mixed naming and identification systems for metabolites are allowed;
(ii) alternatively, use significantly differential abundant metabolites identified by the Biomarker discovery module as input for downstream analysis.
To define the metabolite biomarkers, a metabolic abundance table and a corresponding metadata file containing sample grouping information are required. The metabolite abundance table supports multiple naming conventions and identification systems, which can be used concurrently. As a reference, an example dataset including metabolomic profiles and metadata from individuals with Crohn's disease and non-IBD controls (PMID: 30531976) is available for download.
Users can upload their own files by selecting the “Browse” button to locate the desired files and clicking “Upload” to initiate the upload process.
The uploaded data will undergo transformation and scaling. Choose the appropriate methods, and then click “Proceed” to continue.
Next, choose the grouping information from the uploaded metadata and set parameters for differential abundance analysis, then click “Run” to move forward.
Now, set the statistic thresholds to identify significantly differential abundant metabolite, then click “Filter” to continue.
Statistical results for all metabolites will be presented in a table and are available for download. Only metabolites identified as significantly differential abundant will be carried forward into subsequent analysis. Comprehensive summaries will be provided to describe the analytical methodology, including data structure, preprocessing techniques, and the statistical criteria used to identify differentially abundant metabolites.
The input of a target disease is used to obtain proteins associated with that specific disease. Then, over-representation analysis (ORA) will be performed to determine whether the resulting disease-related proteins are statistically over-represented among those predicted to interact with the query metabolite. By selecting a diseases from the list (using the "radio button", e.g., DOID:8778), the corresponding disease-related proteins will be automatically loaded to the ORA analyses.
Or users can manually input a list of disease-related proteins in the second option with protein names or NCBI Entrez Gene IDs.
Alternatively, if there is no specific disease to be studied, users can opt to skip this section using the third option, in which case disease-related proteins ORA will not be performed in the later analyses.
The analysis scope depends on the input selection made in Step 1:
Set the parameters for the three analytic branches:
The resulting target proteins, structurally similar metabolites and co-abundant metabolites will each be used to retrieve potential interaction proteins/genes based on the threshold of confidence score (in the "Interaction Network" panel). Confidence score ranges from 0 to 1, with a default threshold of 0.7.
Note: This step take time to complete.
If the co-abundant metabolites branch is not skipped, the results of WGCNA analysis will be presented in a tabular format.
When choose to use significantly differential abundant metabolites identified by the Biomarker discovery module as input in step1, a circular plot summarizing key features of these biomarkers will be displayed. The layers of the plot, from inner to outer, are described as follows:
A summary of disease-related gene statistics for each metabolite, based on ORA results from the three analytic branches, will be shown in a table. To view detailed results and perform further analysis for a specific metabolite, click the circle icon next to the corresponding metabolite name.
Users can search for a specific metabolite using the global search box positioned at the top-right of the interface. Upon locating the desired metabolite (e.g., urobilin), clicking the radio button next to its name will enable further analysis.
The target proteins branch for urobilin will show a warning : “The potential interaction genes/proteins were not found”. This indicates that no target proteins for urobilin are present in the default database.
The results of the structural similar metabolites branch will be presented, with details as previously described (see "Analysis of a specific metabolite" module).
Similarly, the results of the co-abundant metabolites branch will be presented, with details as previously described (see "Analysis of a specific metabolite" module).
We appreciate your feedback on our database and webserver. If you have any issues or suggestions, please feel free to contact us.
We are committed to addressing your concerns and considering your ideas for MDLink's future updates. Your support and interest are greatly appreciated!