项目主题

注意:统计遗传学领域继续其前所未有的增长和扩张。此外,新的技术和理念都在不断开拓新的研究领域。因此,下面的项目只提出项目,我们将寻求2019年的夏天,可能现在的程序开始之间变化时,选择完成最及时的,相关的,有意义的研究。

项目区#1。 人统计遗传学研究背景和目的(项目主题)

The technological and computational breakthroughs in the years since the sequencing of the human genome have provided an unprecedented opportunity to understand the etiology of complex human diseases. Notably, the diminishing cost of next-generation sequencing means that it is now possible for researchers to obtain complete genome sequence information on many thousands of individuals. However, major statistical questions remain about optimal design and analysis of studies using next-generation sequencing data to study the contribution of rare variation to common diseases. At the foundation of many such questions is the lack of power for single marker, rare variant tests of association, motivating the development of many, potentially more powerful, variant-set tests, which aggregate evidence from many individual variants into a single test statistic. Research by our group has developed a framework for evaluating the performance of existing variant-set tests on moderately sized variant sets (<100 variants). We then utilized this framework to provide a clear understanding of test performance in a variety of circumstances, develop novel robust and powerful tests, evaluate method performance in light of genotype uncertainty, develop methods to more precisely characterize underlying genetic architecture and apply these methods to better understand the genetics of fatty acids and high blood pressure. Continued technological innovations and lower costs mean that we continue to experience rapid increases in sample size, the proportion of the genome that is sequenced and sequence coverage, which all lead to increases in the number of variants being analyzed. However, mounting evidence suggests poor performance of many widely used variant-set methods as the number of variants increases into the hundreds and thousands. Current and newly proposed variant-set based tests which attempt to address large variant situations vary in how they combine and weight variants, leading to poorly understood differences in performance under different genetic models. Regardless of which tests emerge as optimal in the presence of large numbers of genetic variants, several challenges will remain toward applying these methods to real, imperfect data and then inferring underlying genetic architecture based on a statistically significant test result. Our research will start by gaining a deeper understanding of the behavior of variant-set tests in the presence of large numbers of variants, the realistic application of these tests and the development of methods to decompose significant test statistics to gain information that can guide future studies, leading to a variety of novel approaches and better understanding of why certain methods perform the way they do. This work will provide a critical STep towards successfully identifying risk variants in future sequencing experiments.

目标1:制定一个框架来理解的异形设置测试的行为 我们提出了遗传序列数据的框架,它涉及病例和对照之间变化,以在等位基因频率差异矢量差异。框架直接涉及在遗传性疾病模型来测试性能参数,从而解析洞察现有的测试的行为。我们预计这个框架是促进现有的和新的测试统计的是避免费力的仿真和基于特定遗传模型的假设之前优化测试策略的前瞻性分析的重要工具。

目标2:在不完整的数据的存在评估变体组测试 大多数变种集测试已经对假设完美的基因型,表现型的完美和良好定义的组变异的模拟序列数据进行评估。在现实中,所有这些数据是具有不确定性。我们将制定数据不确定性的综合模型,然后使用分析和模拟方法将这些模型纳入在目标1制定的框架,从而优化和新颖的方法来测试和知情的研究设计。

目标3:开发事后分析,以确定因果变种,并通知复制研究设计 需要进行以下统计学显著变种集的关联,事后的方法来分解检验统计量来提取用于设计复制的研究和推断遗传模型底层的关键信息。我们将开发新的方法来解开变种集检验统计量的轨迹估计的遗传结构,并利用这些信息来确定成本效益的复制设计。

项目区#2细菌统计遗传学研究背景和目的(项目主题)

系统级理解微生物生命的需要真正的综合数据,方法和工具,将使研究人员能够接近以前难以接近的问题 - 在卫生,环境,能源等领域的转型问题,食品等在2009年的国家研究理事会的报告,“一个新的概述生物学为21ST 世纪” - 的基本的生物学意义,而且问题,巩固这些问题的地方。什么是存在于微生物世界的功能多样性?怎么这些功能相互作用,产生我们所观察到的生态系统?是什么驱使这些相互作用?没有这些系统是如何来是 - 进化?回答这些类型的问题一个重要方面是获得的代谢功能和微生物和微生物群落的监管战略扎实,预测的理解。同等重要接近这些问题是新一代的科学家,数学家,计算机科学家和工程师的培训,能熟练使用系统级方法来解决问题,“新生物学”。在PIS有综合性的办法成功记录到系统生物学唯一定位他们做出的数据类型来解决这些基本问题,同时在跨学科,系统科学各级培养本科生的整合显著的进步。三个方面的发展定位我们能够接近的微生物代谢和调节功能预测的理解:

  1. 最近在新一代测序(NGS)技术的突破提供了远远超越了传统的,易于理解的模式生物跨越不同的细菌种类基因组转录和前所未有的数据访问。
  2. Over the past five years, there have been significant advances in the development and understanding of genome-wide metabolic networks for bacterial organisms. In particular, the field of metabolic modeling has moved from mostly manual development of metabolic networks (Francke, C. et al. 2005) to semi- and completely-automated approaches, which are being applied to thousands of sequenced bacterial genomes through our earlier efforts (see below). These efforts have leveraged the subsystems approach to genome annotation as implemented in the SEED (Ross Overbeek et al. 2005) and the rapid, accurate annotation of new microbial genome sequences through the RAST (Aziz et al. 2008), to map genome annotations to biochemical reactions and automatically build metabolic networks. These networks serve as the foundation for steady state metabolic 建模技术 such as Flux Balance Analysis (FBA), which use linear programming to analyze the flow of metabolites through a metabolic network (Orth & B. Ø. Palsson 2011). As a result of our work on development of the ModelSEED (C. S. Henry et al. 2010; Matthew DeJongh et al. 2007), we have now generated metabolic models (MMs) for over 3000 microbial organisms annotated using the SEED/RAST sySTem. This serves as a valuable resource for the modeling community to explore properties of metabolic networks and interactions between microbes (Freilich et al. 2011).
  3. 平行,已经出现了在上一个全基因组规模转录调控网络(TRNS)用于测序微生物的表征显著进展。方法来重建一个有机体的TRN已经使用两种基本方法:(1)为特定的转录因子(TF)调节的结合基序的在生物体的识别,以确定潜在的TF-靶基因的关系(DA拉德兹奥诺夫2007)和(2) TF-目标关系通过统计分析全基因组基因表达数据(机器学习)的发现(例如,(信仰等人,2007))。 TRNS正在被系统的预测使用第一种方法主要的细菌团(例如,(d拉德兹奥诺夫等人2011;。d ravcheev等人2011))。并且提供通过数据库如regprecise(P novichkov,laikova,等人,2010)。

在代谢建模领域最大的挑战之一是这样的 - 代谢模型必须捕获基因调控信息,以便更准确地模拟生物与其环境的代谢反应(Lewis等2012)。此外,这种整合最终需要在集成模型规模实现千生物。

在该提议中,我们列出了一系列方法论的进步,将(1)有效地这三种数据类型(代谢模型,TRNS和基因表达数据)结合成一体代谢调节模型(imrms)和(2)提供,将有助于应用程序研究人员在微生物代谢和监管职能的系统的探索。

瞄准1.我们将开发新的和改进在当前的方法解决方法的弱点集成代谢的监管模式(imrms)。 当前方法imrms的发展忽略或显著下解释表达数据。我们已经确定了一系列的方法改进,这将大幅度提高imrm发展。特别是,我们将开发的方法(a)中严格估计基因活化状态从表达数据和(b)(活动,不活动)利用在创建imrms的改进的基因状态的估计。这些方法将使用被评估 在硅片 建模技术.

瞄准2.我们将开发利用imrms以预测湿实验室实验来生成和测试新的生物假说病症的方法。 我们将开发利用的目的1开发建议湿实验室实验中可公开获得的数据集的探索下,代谢网络目前的扰动方面imrms方法。这将允许研究人员揭开新的生物,进一步细化imrms和代谢反应链接到未知功能的基因。我们将进行有针对性的湿实验室实验模式生物,产生的RNA序列转录组数据和基因功能的验证具体的假说。

目标3.我们将开发和应用新的方式利用上千种生物MMS和imrms更好地理解代谢和监管的多样性。 鉴于当前访问数以千计的MMS(见下文)和获得imrms许多生物(目标1的结果),我们将在一个独特的位置使用这些彩信和imrms的开发和应用方法,探索的影响环境,进化和社会各界对代谢和监管的多样性。