Linking the duodenal microbiota to stunting in a cohort of undernourished Bangladeshi children with enteropathy

BACKGROUND Environmental enteric dysfunction (EED) is an enigmatic disorder of the small intestine postulated to play a role in childhood undernutrition, a pressing global health problem. Defining the incidence of EED, its pathophysiology, and its contribution to impaired linear and ponderal growth has been hampered by the difficulty in directly sampling the small intestinal mucosa and microbial community (microbiota). METHODS Slum-dwelling Bangladeshi children aged 18±2 months, with linear growth-faltering (stunting) who failed a nutritional intervention underwent endoscopy to obtain duodenal biopsies and aspirates. Levels of 4077 plasma proteins and 2619 duodenal proteins were quantified in 80 children with histopathologic evidence of EED, and the abundances of bacterial strains in their duodenal microbiota were determined using culture-independent methods. Young germ-free mice, fed a Bangladeshi diet, were colonized with bacterial strains cultured from the duodenal aspirates. RESULTS The absolute abundances of a shared group of 14 bacterial strains recovered from the duodenums of children with EED and not typically classified as enteropathogens were negatively correlated with linear growth (length-for-age Z-score;β=-0.38±0.12(SEM);ρ=-0.49;p=0.003), and positively correlated with duodenal proteins involved in immunoinflammatory responses. Representation of these 14 duodenal taxa was significantly different in fecal microbiota from EED versus healthy children (p<0.001;PERMANOVA). Gnotobiotic mice colonized with cultured EED-donor duodenal strains develop a small intestinal enteropathy. CONCLUSIONS These results provide evidence of a causal relationship between components of the small intestinal microbiota, enteropathy and stunting and offer a rationale for developing therapeutics that target what must no longer remain terra incognita-the small intestinal microbiota. ClinicalTrials.gov identifier: NCT02812615


Sample collection
Sample collection procedures have been described previously (1).Five mL of venous blood was collected (S-Monovette 7.5 mL tube, Sarstedt) just prior to endoscopy, transported to the laboratory at 4 °C, centrifuged to collect plasma; plasma was stored in aliquots at -80 °C.[Celiac disease was excluded by measuring circulating levels of tissue transglutaminase IgA].Duodenal aspirate samples were collected from the second portion of the duodenum using sterile Endoscopic Retrograde Cholangiopancreatography (ERCP) catheters.Several distal duodenal biopsies were taken (1).A duodenal biopsy was flash frozen in liquid nitrogen and preserved at -80 °C for subsequent proteomic analysis.Fecal samples were collected immediately prior to endoscopy, aliquoted into 2 mL cryovials, placed in liquid nitrogen within 20 minutes of production and subsequently stored at -80 °C until further processing.

Assessing histopathologic severity of EED
An operational categorization of EED was developed for this study based on three histopathologic features evident in hematoxylin-and eosin-stained tissue sections: infiltration of inflammatory cells in the lamina propria, blunting or atrophy of small intestinal villi, and hyperplasia or elongation of crypts of Lieberkuhn.Because inflammatory infiltration takes place prior to structural changes of either villus atrophy or crypt hyperplasia, mild EED (Grade 1) refers to the sole presence of inflammatory infiltrates in the lamina propria.Moderate EED (Grade 2) refers to any one of the two structural changes, i.e. either villi or crypts, in addition to inflammatory infiltration in the same biopsy sample.Severe EED (Grade 3) refers to the presence of structural changes involving both villi and crypts in addition to inflammatory infiltration in the same duodenal biopsy sample.Biopsies without any of these three features were considered as having no evidence of EED (Grade 0).

Breastfeeding
Mothers of BEED study participants were interviewed prior to and after nutritional intervention to ascertain their 'breastfeeding status', e.g. a binary metric representing whether or not the child was breastfeeding at the time of the interview.PCA parameterized by the 14 core taxa followed by PERMANOVA was performed on the 36 children with histopathologic evidence of EED to determine whether breastfeeding status was a significant determinant of duodenal community bacterial composition.

Processing of plasma samples for proteomic analyses
The SomaScan 5K Proteomic Assay plasma/serum kit (SomaLogic) was used to measure 5,284 proteins in plasma samples (50 μL aliquots) according to the manufacturer's protocol.Proteins were tagged with NHS-biotin reagent, captured as a SOMAmer (protein-specific aptamer) /protein complex immobilized on streptavidin beads, cleaved, denatured, eluted and hybridized to a custom Agilent DNA microarray.
Microarrays were scanned with an Agilent SureScan instrument at 5 µm resolution and the Cy3 fluorescence readout was quantified.Raw fluorescence signal values from each SOMAmer reagent were processed using SomaLogic's SomaScan standardization procedures, including steps for hybridization normalization, plate scaling, median scaling and final SOMAmer calibration, each of which generates a SomaScan '.adat' format data file.The final .adatfile was log2-transformed, quantile-normalized and filtered to remove all non-human SOMAmer reagents.An additional filter was applied to remove SOMAmer reagents with a probe-wise variance <0.07 to ensure protein abundances were within the limit of detection, resulting in quantitative abundance data for 4,077 plasma proteins used for all downstream analyses.We used singular value decomposition (SVD) to remove batch-associated variance from the SOMAmer fluorescence values.The standardized, transformed, normalized, filtered, and batch-corrected signal values were employed for comparisons between the plasma proteomes of anthropometrically healthy children and children with EED, as well as for pre-versus post-nutritional intervention comparisons.For all other analyses utilizing measured abundances of plasma proteins, batch-correction was not performed.Note that the SomaScan platform contains aptamers with redundancies against several proteins.The reasons for these redundancies include capturing different isoforms, detecting proteolytic cleavage products or providing an internal control for probe specificity.

Processing of duodenal biopsy samples for proteomic analyses
Protein extraction from each duodenal biopsy was accomplished by adding 100 µL of ice-cold T-Per tissue protein extraction buffer (Thermo Scientific) plus Complete Ultra protease inhibitor (Roche).
Samples were homogenized on ice using a disposable manual pestle until no tissue fragments were visible, and centrifuged at 14,000 x g for 10 min at 4 o C. Total protein concentration in the resulting supernatants was quantified using Micro BCA Protein Assay Kit (Thermo Scientific).Protein concentrations in all samples were normalized to 200 µg/mL with PBS.
The SomaScan 5K Proteomic Assay CEL/tissue kit was used to measure 5,284 proteins in duodenal lysate samples (50 μL aliquots of a 200 µg/mL mixture) according to the manufacturer's protocol.Fluorescence values for each SOMAmer reagent were standardized, transformed, normalized, and filtered in the same fashion as the plasma proteins.Principal components analysis (PCA) revealed batch effects that were corrected for using the 'removeBatchEffects' function from the R package limma (2).The resulting standardized, transformed, normalized, filtered, and batch-corrected signal values for 2,619 duodenal proteins were used for all downstream analyses.

Bacterial 16S rDNA analyses
Prior to isolating DNA from duodenal aspirates, 1.1 x 10 6 Alicyclobacillus acidiphilus cells were added to a 100 µL aliquot of an aspirate sample as a spike-in quantitative control.The material was then centrifuged at 5000 x g for 10 min at 10 o C in a 2 mL v-bottom tube (Axygen).All but 50 µL of the resulting supernatant was removed (to avoid disturbing the pellet).The pellet was then resuspended in the residual supernatant by pipetting, followed by digestion with proteinase K (Thermo Fisher Scientific; final concentration, 1 µg/µL reaction; incubation for 10 hours at 65 o C, followed by 10-minutes at 95 o C to inactivate the enzyme).DNA from the resulting duodenal aspirate material, and from fecal samples (3), were subsequently isolated and V4-16S rDNA amplicon libraries were generated as described previously (3).Amplicon libraries were sequenced (2x250 nt paired-end reads; Illumina MiSeq instrument) to a depth of 3.9 ± 2.3 x 10 4 reads/duodenal aspirate and 4.4±7.6 10 4 (mean±SD) reads/fecal sample, yielding a mixture of phased amplicons in two different orientations to achieve optimal base balance.The 'bbduk.sh' and 'repair.sh'tools in bbtools (37.02; https://sourceforge.net/projects/bbmap/) were used to orient the amplicons, trim primer sequences, and ensure proper read pairing.Pre-processed data were analyzed using the R (3.5.1) implementation of DADA2 (1.8.0; (4)) to identify and quantify ASVs.
Taxonomic assignments of ASVs were performed using the DADA2 'assignTaxonomy' tool and the

GreenGenes training set (version 13.8).
For duodenal aspirates, ASVs were agglomerated by phylogenetic distance using the phyloseq 'tip_glom' tool (tree height of < 0.2), and bacterial loads in duodenal aspirates were calculated by dividing total 16S rDNA amplicon sequencing reads not mapped to A. acidiphilus by A. acidiphilus reads (5,6).Counts were normalized using DESeq2 (7), scaled by bacterial load, to obtain the absolute abundance of each agglomerated ASV, and log10-transformed to obtain the final ASV absolute abundance data used for downstream analyses.
For the analysis of fecal microbiota of healthy children or children with EED, ASVs were agglomerated using 'tip_glom' and normalized using DESeq2 in the same fashion as ASVs from the duodenal aspirates.Agglomerated and normalized fecal ASV abundances were then transformed using 'varianceStabilizingTransformation' from the DESeq package (7), producing the compositional abundance data used for downstream analyses.

Quantifying pathogen burden by multiplex quantitiative PCR
Nucleic acids were isolated from duodenal aspirates and fecal samples (3) and the concentration normalized to 2 ng/µL.Levels of bacterial, viral, and protozoal gastrointestinal pathogens were determined using a microfluidic-based digital PCR system with 96.96 Dynamic Arrays (Fluidigm).
TaqMan primers and probes were used to construct assays for 18 different enteropathogens.qPCR and data analysis methods are described in ref. (9).Reported values are absolute measurements of DNA mass in pg (all bacterial and protists) and copy number (all RNA viruses).Adenovirus-infected cells were used as the standard for the Adenovirus target; values reported for Adenovirus are a ratio of mass of viral DNA per cell lysate mass.Values were log-transformed prior to correlation analysis.

Recovering, sequencing and annotating bacterial strains from duodenal aspirates
All steps for culturing bacterial strains were performed in anaerobic chambers (Coy Laboratory Products).
Duodenal aspirate samples (volumes ranged from 300 to 500 µL) were thawed.A 100 µL aliquot was taken for nucleic acid extraction, while the remaining volume was divided evenly into fourths to streak on BHI agar supplemented with 10% defibrinated horse blood, with and without vancomycin (10 µg/mL).
Plates were incubated at 37 o C for 2-3 days under anaerobic (5% H 2 , 20% CO 2 , 75% N 2 ) and microaerophilic (5% O 2 , 5% CO 2 , 90% N 2 ) conditions.Colonies were re-struck and initially identified by Matrix-Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) mass spectrometry (Vitek MS; bioMérieux Clinical Diagnostics).Cells were sub-cultured in LYBHI broth (BHI broth supplemented with 0.05% L-cysteine HCl and 0.5% yeast extract) for anaerobic colonies and BHI broth for microaerophilic colonies.Broth cultures were incubated for 1-2 days.Multiple stock vials were prepared in PBS/15% glycerol and stored at -80 o C. DNA was extracted from individual strains, quantified (Qubit), and normalized to a concentration of 0.75 ng/mL.Libraries were generated from each sample using the Nextera XT kit (Illumina).Genomes were assembled using SPAdes (10) under default parameters, and annotated with Prokka (11) and RAST (12).Species assignments were made using CheckM (13).Unique species were identified by calculating pairwise ANI (average nucleotide identity) values between genomes, using an ANI threshold of greater than 95% for species demarcation.Based on these results, we selected 39 cultured sequenced strains representative of the bacterial taxa identified by V4-16S rDNA amplicon sequencing and MALDI-TOF.These organisms were clonally-arrayed into multi-well plates.
Monocultures of the organisms were combined in equivalent amounts, as assessed by OD 600 values, to create a defined consortium for gavage into gnotobiotic mice.
In silico reconstructions of selected mcSEED metabolic pathways were based on functional gene annotation and prediction using homology-based methods and genome context analysis (14,15).
Reconstructions were represented as a binary phenotype matrix (BPM) where for amino acids and B vitamins, "1" denotes a predicted prototroph and "0" an auxotroph, for carbohydrates, "1" and "0" refer to a strain's predicted ability or inability, respectively, to utilize the indicated mono-, di-or oligosaccharide, and for fermentation end products (short chain fatty acids), a "1" and "0" indicate a strain's predicted ability/inability to produce the indicated compound, respectively.

GNOTOBIOTIC MOUSE STUDIES
All mouse experiments were performed using protocols approved by Washington University Animal Studies Committee.

Design and preparation of a representative Mirpur diet
The composition of this diet was based on Bangladeshi complementary feeding practices for 18-monthold Mirpur children, including quantitative 24-hour dietary recall surveys conducted at the Mirpur site as part of the MAL-ED study (16).A pelleted version of this diet was manufactured for gnotobiotic mouse studies by Dyets Inc. Rice (parboiled, long grain) and red lentils (masoor dal) were each cooked separately with an equal weight of water at 100 o C in a steam-jacketed kettle until 'par-cooked' (grains cooked, but still firm) and then set aside.Fresh market white potatoes, spinach and yellow onions were washed, chopped in a vertical cutter mixer and cooked in the kettle without added water at 70 o C until soft.Sweet pumpkin (Calabaza variety) was cut and boiled in the steam-jacketed kettle until soft, and then strained.At this point, all of the cooked ingredients were combined, soybean oil, salt, turmeric and garlic were added (see Table S8 for quantities of ingredients).The resulting diet was mixed extensively, spread on trays, dried overnight at 30 °C, and pelleted by extrusion (½" diameter; California Pellet Mill, CL5).
Dried pellets were aliquoted into ~250g portions, placed in a paper bag with an inner wax lining, which was placed in a plastic bag.Bags were subsequently vacuum sealed and their contents sterilized by gamma irradiation (30-50 kGy; Sterigenics).Sterility was verified by culturing irradiated pellets in Brain Heart Infusion (BHI) broth, Nutrient broth, and Sabouraud-dextran broth (all from Difco) for one week at 37 °C under aerobic conditions, and in Tryptic Soy broth (Difco) under anaerobic conditions (atmosphere of 5% H 2 , 20% CO 2 , 75% N 2 ).Additionally, cultures of all diets were plated on BHI agar supplemented with 10% horse blood (Difco).The irradiated diet pellets had an energy density of 4.88 kcal/g and were composed of 10.5% protein, 21.1% fat, and 0.42% fiber (Nestlé Purina Analytical Laboratories).
All diets were stored at -20 °C prior to use.

Animal Husbandry
Mice were housed in plastic flexible film gnotobiotic isolators (Class Biologically Clean Ltd.) at 23 o C under a strict 12-hour light cycle (lights on a 0600h).Male germ-free C57BL/6J mice were initially weaned onto an autoclaved, low-fat, high-plant polysaccharide chow that was administered ad libitum (Diet 2018S, Envigo).Animals were maintained on this diet until 3 days prior to the beginning of experiments.At 5.5 weeks of age, the defined consortium of 39 sequenced bacterial strains obtained from children with EED, or intact uncultured microbiota recovered from the cecum of an adult conventionallyraised C57BL/6 mouse was administered using a disposable sterile gavage needle (n=5 mice/cage/treatment group/experiment).All animals were weighed every 2 days and euthanized by cervical dislocation without prior fasting at the end of the study.

Community profiling by sequencing (COPRO-Seq)
The distribution of bacterial strains along the length of the gastrointestinal tracts of mice gavaged with the 39 cultured strains was defined by COPRO-Seq (17).Briefly, DNA was isolated from luminal contents harvested at the time of euthanasia from the duodenum (defined as proximal third of the small intestine), jejunum (middle third), ileum (distal third), cecum, colon, and feces.COPRO-Seq libraries were prepared; the barcoded libraries were quantified (Qubit dsDNA HS kit), pooled and then subjected to multiplex sequencing [Illumina NextSeq instrument; unidirectional 75 nt reads; 2.1 ± 0.2 x 10 6 reads/sample (mean ± SD)].Reads were demultiplexed and mapped to the reference genomes of community members plus 2 "distractor" genomes (Bacteroides fragilis NCTC 9343 and Clostridium perfringens ATCC13124).The proportion of reads mapping to "distractor" genomes in each sample was used to set a conservative threshold cutoff (mean + 2 SD), indicating the presence/absence of an organism in the community on a per-sample basis.Normalized counts for each bacterial strain in each sample were used to produce a relative abundance table.

Assays of MMP-8
Total protein was extracted from duodenum, jejunum, and ileum using the same procedure described above for human duodenal biopsies.Levels of MMP-8 in these intestinal segments and in the serum of mice colonized with the EED consortium, and in CONV-D controls, were assayed using MILLIPLEX MAP Mouse MMP Magnetic Bead Panels (MilliporeSigma) following the manufacturer's instructions.

Duodenal RNA-Seq
Total RNA was isolated from 10 mg of flash frozen tissue taken 6 cm from the gastroduodenal junction of mice euthanized 9 days after the initial gavage with the EED consortium or the cecal contents from a conventionally-raised C57Bl/6J mouse (RNeasy Plus Universal Mini Kit; Qiagen).Total RNA quality was checked with an Agilent Bioanalyzer 2100 using RNA 6000 Pico Chips (Agilent).For each sample, cDNA was synthesized from 10 ng total RNA using the SMARTer Ultra Low Input RNA for Illumina Sequencing -HV kit (Clontech).cDNA was sheared to 200-500 bp with a Covaris AFA system.A library was constructed following the Clontech "adapted Nextera (Illumina) DNA sample preparation protocol for use only with SMARTer ultra-low DNA kit for Illumina sequencing."Samples were sequenced on an Illumina NextSeq 500 instrument using the High-Output 150 v2 Kit to generate 75-nucleotide paired-end reads [21.2 ± 9.3 x 10 6 (mean ± SD) reads/sample].Reads were aligned to the Ensembl release 89 mouse primary assembly with STAR v2.5.3a (18).Gene counts were derived from the number of uniquely aligned unambiguous reads using featureCounts (Subread version 1.4.6-p5)(19).Sequencing performance in terms of total number of aligned reads, total number of uniquely aligned reads and gene body coverage was determined using RSeQC version 2.6.2 (20).
Gene counts were imported into the R/Bioconductor package DESeq2 (7) and normalized using default settings.Genes whose normalized counts added up to less than one across all samples were removed.Differential expression analysis was performed using DESeq2 (7).Differentially expressed genes were defined as those with an FDR-adjusted p<0.10 and fold-difference >1.25.

Flow cytometry
Small intestines were flushed to remove luminal contents.Intestines were opened lengthwise and gently agitated at 23 o C for 20 minutes in Hanks Balanced Salt Solution (HBSS) supplemented with 10% bovine calf serum, 5 mM EDTA and 15 mM HEPES.Intestines were vortexed and supernatants were separated from tissue fragments by careful decantation.A second round of gentle agitation and vortexing of the pellets in the same buffer was performed and the supernatants from each round were combined and used for staining of intraepithelial lymphocytes.The tissue was then rinsed with HBSS prior to digestion with Collagenase IV (Sigma) in complete RPMI-10 for 40 minutes at 37 °C with gentle agitation.Digests were filtered through 100 µm mesh (BD Biosciences) and subjected to density gradient centrifugation using 40% and 70% Percoll solutions (GE Healthcare).Mesenteric lymph nodes were excised and single cell suspensions were prepared by mechanical disruption for T-cell isolation.For flow cytometry, single cell suspensions were incubated with Fc Block for 10 minutes, and then stained with antibodies and Fc Block for 20 minutes at 4 °C.Dead cells were excluded using either a Live/Dead Fixable Cell Stain Kit (ThermoFisher Scientific) or DAPI (Sigma).Intracellular proteins were stained using either the BD Biosciences Fixation/Permeabilization Solution Kit or the eBioscience Transcription Factor staining kit.
Cells were run on a FACSCanto II instrument (BD Biosciences) and data were analyzed using FlowJo (FlowJo LLC).Cells counts were performed with counting beads (eBioscience).

Assaying spleens for viable bacteria
Nine days after the first gavage, spleens were harvested at the time of euthanasia and homogenized in 1.0 mL PBS. 100 µL of the resulting homogenate was plated on BHI horse blood agar and incubated for at 37 o C for 12-16 hours under aerobic conditions.Cultured isolates were initially classified by their morphotypes, followed by Vitek MALDI-TOF mass spectrometry and sequencing full-length 16S rDNA amplicons (generated from their genomic DNA using primers 8F and 1391R).

Effects of nutritional intervention on ponderal and linear growth
A paired t-test between WAZ and LAZ at the time of endoscopy and WAZ and LAZ just prior to beginning the nutritional intervention was performed to determine the effects of the nutritional intervention on ponderal/linear growth.

Relationship between histopathologic score and linear growth
The effect of histopathologic severity on LAZ was assessed by creating a linear model containing histopathologic score, sex, and age at the time of endoscopy as covariates and LAZ at the time of endoscopy as the outcome; the coefficient reported in Table 1 describes the relationship between histopathologic score and LAZ, controlling for sex and age.

Relationship between fecal biomarkers and linear growth
Concentrations of fecal biomarkers measured from the 110 stunted children who failed nutritional intervention were log-transformed and correlated against LAZ.

Relationship between a fecal 'EE biomarker score' and LAZ
A previously described environmental enteropathy score ('EE score') based on trichotomized measurements of fecal alpha-1-antitrypsin (AAT), myeloperoxidase (MPO), and neopterin (NEO) was adapted for this study (21).Briefly, the published 'EE score' is calculated by the following equation: The total 'EE score' ranges from 0-10, with 10 being the most severe.However, because the score is based on quantiles, comparisons between patient populations are impacted by the unique distribution of biomarker levels in different study cohorts.We therefore adapted the 'EE score' into a continuous composite variable as described in (22) and defined by the following equation:

Comparative analyses of the plasma proteomes of healthy children and children with EED
Three comparisons using an empirical Bayesian linear modeling framework [limma (2)] were performed: healthy vs. BEED pre-intervention; healthy vs. BEED post-intervention; and BEED pre-vs.postintervention.For the former two analyses, linear models including 'group' (healthy or BEED), age and sex as covariates and the abundance of a protein as the response were generated using limma.For the latter analysis, 'group' was the primary predictor and patient ID was added as a covariate to encode a paired analysis in limma.For each of the three analyses, only individuals with a duodenal biopsy and a histopathologic diagnosis of EED were included (n=80).

Relating fecal biomarkers and plasma proteins to LAZ
To determine the relationship between plasma proteins and LAZ at the time of endoscopy, Pearson correlation coefficients were calculated between each of the 4077 log2-transformed, quantile-normalized plasma protein abundances and LAZ.To identify modifiers of the effects of IGF-1 on LAZ, a separate linear model was created for each plasma protein or log-transformed fecal biomarker using the following formula: where  0 is the model intercept,  1 and  2 are the coefficients for the main effects of IGF1 and the putative interacting protein, respectively, β is the coefficient for the interaction term, and ε is the residual.
Statistical significance was calculated using the 'lm' function in R, and is reported in Table S4 for the interaction coefficient β.Linear models for which coefficients are reported were generated after z-scoring the independent variables.Thus, the β coefficient represents the change in the effect of IGF-1 on LAZ after a unit standard deviation change in the interacting protein.

Identifying modules of co-expressed duodenal proteins
Independent components analysis (ICA) was initially developed in the field of signal processing to solve the blind-source separation (BSS) problem (23).One formulation of the BSS problem is trying to identify which instruments are playing in an orchestra while only hearing the entire orchestra play at the same time.More recently, ICA was shown to be an effective module detection method to identify biologically meaningful co-expression profiles in various types of expression data when formulated as a BSS problem (24).Given an observed signal (protein expression), ICA attempts to deconvolve the signal into statistically independent components (ICs).These ICs represent modules of proteins whose members have distinct expression profiles compared to proteins in other ICs/modules.The measured variation in protein expression between samples can thus be thought of as alterations in the activation or inhibition of statistically orthogonal biological pathways that have been 'mixed' together; the resulting admixture is the plasma proteomic data measured by SomaScan, while the discovered ICs represent pathways of potential interest.
The number of ICs was determined using a random matrix theory approximation described elsewhere (25).ICA was performed using the R implementation of the JADE algorithm (23), which attempts to maximize the difference in fourth-order moments as a proxy for identifying statistically orthogonal ICs.For the n x m protein expression matrix (n = number of proteins, m = number of subjects), ICA returns a n x s protein projection matrix where each element is the projection of protein n along IC/module s, as well as a m x s subject projection matrix where each element is the projection of patient m along IC/module s.Modules were defined by identifying proteins with statistically significant (FDRcorrected p-value < 0.05) projections along a given IC as calculated by the 'fdrtool' package in R (26).
To determine the functional significance of modules identified by ICA, we performed GO overrepresentation analysis on each module using the R package topGO (29).TopGO attempts to decorrelate dependencies between GO terms when calculating enrichment scores using a combination of gene (protein) removal and maximizing enrichment scores within a given neighborhood on the GO directed acyclic graph.The weight01 method was used to calculate enrichment, and a p-value < 0.05 was used to define statistically significant enrichment of a GO term.The p-value represents the adjusted pvalue calculated by topGO, which applies a Bonferroni-like correction during p-value calculations.All GO 'biological processes' with 10 or more proteins represented in the 2619 duodenal proteins passing QC were included in the analysis.

Cross-correlation singular value decomposition (CC-SVD) analysis
CC-SVD decomposes a cross-correlation matrix into flexible modules of features with conserved correlation profiles.Beginning with two matrices with dimension m x n and m x p, CC-SVD computes a Pearson cross-correlation matrix (n x p).Next, the matrix is decomposed using singular value decomposition (SVD) into left and right singular matrices that contain the left and right singular vectors, also known as eigenvectors (EVs); these EVs represent correlation profiles, or modules of highly crosscorrelated features.
The number of modules was determined using the random matrix approximation described elsewhere (25).The module size was chosen to include duodenal proteins with the top 50 most positive and most negative projections.All 14 ASVs were included.Module members were then extracted from the cross-correlation matrix and plotted using the 'corrplot' function in R (28) in rank order by their projections onto the EV, with larger magnitude projection values indicating a stronger correlation profile.
Proteins with positive projections are positively correlated with ASVs with positive projections, and negatively correlated with ASVs with negative projections.Conversely, proteins with negative projections are positively correlated with ASVs with negative projections, and negatively correlated with proteins with positive projections.Thus, the approach leverages the ability of SVD to identify orthogonal variance in a cross-correlation matrix in order to find modules of covarying features across multiple feature types; this approach was used to assess relationships between the duodenal proteome and the absolute abundances of duodenal bacterial taxa.CC-SVD was also performed on the 4077 plasma proteins and the top 100 (50 positively and 50 negatively correlated) duodenal microbiota-associated duodenal proteins to identify members of the plasma proteome that significantly correlated with the profile of the 100 duodenal proteins.Projections of anorexia and muscle wasting (36) (limma unadjusted p<0.01; see Figure S2 and Table S2E,F).

Characterization of immune cell populations in gnotobiotic mice colonized with the EED donorderived bacterial consortium
To characterize immune cell populations in the small intestinal epithelium and lamina propria plus mesenteric lymph nodes, two follow-up experiments were performed where the entire length of the small intestine and mesenteric lymph nodes were recovered from mice euthanized 7 days following the final gavage with (i) the full 39-member bacterial consortium or (ii) the cecal contents of a conventionallyraised mouse (n=5 animals/treatment group/experiment; Table S12).Within the epithelial compartment, there were no statistically significant differences in the absolute numbers or frequencies of TCRαβ or TCRγδ lymphocytes between the two groups of animals.In the small intestinal lamina propria compartment, we observed a significant increase in the number and frequency of CD4+ T-cells and among them Rorγt+ Th17 but not Foxp3+ Treg cells in mice colonized with the EED consortium.Moreover, ex vivo tests of effector function in small intestinal lamina propria-derived activated T cells disclosed increased IL17A production but no significant difference in IFNγ production in mice harboring the EED community compared to the CONV-D group.
The increase in Th17 cells in EED mice was paralleled by changes in the innate immune compartment, particularly among ILC3 subsets which, along with Th17 cells, contribute to the maintenance of the intestinal epithelial barrier through secretion of IL-17 and IL-22A (37).CCR6+ ILC3s, which are the major innate source of IL-17, were significantly decreased, whereas NKp46+ ILC3s, which predominantly produce IL-22, were significantly increased in the small intestinal lamina propria of EED consortium-colonized mice (Table S12).Ex vivo stimulation experiments disclosed that IL-17A production by CCR6+ ILC3s was also significantly reduced in mice harboring the bacterial consortium, while no significant differences were observed in IL-22 production by NKp46+ ILC3s between the two groups of animals (Table S12).Thus, the observed increase of Th17 cells may reflect a compensatory response to a defect in CCR6+ ILC3s.
Mice colonized with the EED consortium also had significantly increased numbers of lamina propria CD11b+ CD103+ dendritic cells and Th17 cells among the memory CD4+ T cell (CD44 hi CD62L lo ) population present in their mesenteric lymph nodes (Table S12).Dendritic cells present in intestinal tissues acquire antigens and migrate to mesenteric draining lymph nodes where they prime antigen-specific T cells.Interestingly, CD4+ T-cells producing IL-17A are almost exclusively primed by CD11b+CD103+ dendritic cells in mesenteric lymph nodes (38)(39)(40), supporting our observation that colonization with the duodenal bacterial consortium from children with EED induces an immune response in the small intestine biased towards the generation of Rorγt+Th17 cells.Such a response, if sustained, may contribute to intestinal inflammation.The large circle represents the mean and the vertical line denotes the standard error.Differential abundance analysis for each of the 14 duodenal 'core taxa' was performed using a two-sided Mann-Whitney U test (FDR-corrected).PCA of normalized, variance-stabilized counts followed by permutation ANOVA (PERMANOVA) was used to test for significant separation between healthy versus EED fecal samples.In this binary phenotype matrix "1" denotes a predicted prototroph and "0" a predicted auxotroph as inferred from in silico reconstruction of the respective mcSEED metabolic pathways/modules for amino acid and B-vitamin biosynthesis.In the case of carbohydrates, '1' and '0' refer to a strain's predicted ability or inability, respectively, to utilize the indicated mono-, di-, or oligosaccharide.Binary phenotypes for fermentation end products (short chain fatty acids, SCFA) are related to a strain's predicted ability to produce the indicated compound.In bold are the 23 strains detected at greater than 0.01% relative abundance at one or more locations along the gastrointestinal tract, averaged across all recipient mice.

ONLINE SUPPLEMENTARY MATERIALS Online Supplementary Data Table 1 -Duodenal proteome module membership in children with
EED and GO Biological Process enrichment for each module.
= 2 × (  /) + 0.2 × (  /) + (  /) This modified score is based on the absolute concentrations of each fecal biomarker, preserves the relative contribution of each biomarker to the total 'EE biomarker score', and brings the score into a comparable arithmetical range to the originally published 'EE score'.[In the cohort of 110 stunted children who did not respond to nutritional intervention, the trichotomized 'EE score' and the continuous 'EE biomarker score' were well correlated (Pearson rho=0.64,p=5.56x10 -14 )].The resulting 'EE biomarker score' was log-transformed and correlated against LAZ.

Figure S2 -
Figure S2 -Plasma proteins whose abundances are significantly different between children living inMirpur with EED, prior to and after their nutritional intervention, and children from Mirpur judged to have healthy growth phenotypes.Shown is an 'UpSet' plot generated in R using data shown in TableS5.

Figure S3A -
Figure S3A -CC-SVD analysis of duodenal proteins and duodenal bacterial taxa

Figure S3B -
Figure S3B -CC-SVD analysis of duodenal proteins and duodenal bacterial taxa

Figure S5 -
Figure S5 -Relative abundances of core duodenal taxa in the fecal microbiota of children living in Mirpur who have healthy growth phenotypes (n=27) and those with EED (n=48).

Figure S6 -
Figure S6 -Binary phenotype matrix of in silico predictions of metabolic functions of 39 cultured bacterial strains recovered from BEED duodenal aspirates.

Figure S7 -
Figure S7 -Analysis of the distribution of bacterial strains along the length of the intestines of gnotobiotic mice.
TableS8-qPCR assays for enteropathogens.Enteropathogen burden in (A) duodenal aspirates from children with EED (n=36) and (B) feces from children with EED (n=23) and healthy reference children (n=19).Reported values are absolute measurements of DNA mass in pg (all bacteria and protists) and copy number (all RNA viruses).Table S9 -Mirpur-18 diet given to gnotobiotic mice.(A) Ingredients used to formulate diet.(B) Nutritional analysis of diet.Table S10 -Distribution of members of the bacterial consortium along the length of the gastrointestinal tracts of recipient gnotobiotic mice.Values represent fractional abundances.Rows with '-' indicate missing samples.Table S11 -Differential expression of genes in the duodenums of gnotobiotic mice colonized with the cultured bacterial consortium from children with EED versus CONV-D controls.List of differentially expressed genes ranked according to p-value (DESeq2).Table S12 -Flow cytometric analysis of immune cell populations present in the small intestinal epithelium and lamina propria plus mesenteric lymph nodes of gnotobiotic mice colonized with the EED bacterial consortium and CONV-D controls.