Cartographers in the genome labyrinth

The 3D landscapes of the genome have to change all the time. Otherwise genes would not be expressed properly and physiology would be in disarray. Ana Pombo’s group is mapping this genomic regulation.

Imagine sitting at a computer and examining millions of photos taken by tourists on a single day in Berlin. There are a lot of shots of crowds – how much can you deduce about the people in them?  Those two over there – are they a pair, or strangers? One way to find answers might be to combine information from lots of images. Friends tend to walk around together, or meet over and over in different places, so their faces should appear in the same images more often. By combining facial recognition software with some powerful computing and statistics, you'd probably discover a lot of relationships.

A new way of studying the dynamic genome architecture

This parallels a strategy that Ana Pombo's lab, in collaboration with Mario Nicodemi’s lab in Naples, is using to tackle one of the most challenging questions in today's science. Her group at the Berlin Institute for Medical Systems Biology in the MDC is interested in the relationships between DNA sequences rather than those of tourists. Several years ago Ana conceived a new way of studying the way cells dynamically reorganize their DNA to carry out vital functions. The project, which represents a collaboration between Ana's lab and groups in the Italy, the UK and Canada, required solving a number of challenging technical problems. Now everything has finally come together in a publication in the journal Nature.

The main issue the scientists are trying to understand revolves around how genes are activated to produce RNA and proteins, then switched off again when the molecules are no longer needed. The systems that control these processes are vital to nearly everything that happens in our cells. Timing is crucial: the failure to produce a molecule when it is needed – or producing one when it shouldn't be – can spell the difference between life and death for a cell and organism.

Knowing the positions of genes is not enough

For years scientists have known that gene activation and silencing involves spatial rearrangements of strands of DNA that make up chromosomes in the cell nucleus. Genes occupy specific positions along these strands; the instructions that help control their activity usually lie somewhere else. They may be near or far away on the same strand of DNA, and in extreme cases they may be located on another chromosome.

These are colonies of mouse embryonic stem
cells, where cell nuclei are stained in blue. The DNA from the nuclei is sequenced to infer the relative positions of genes and their switches.
Image: C. Ferrai, MDC

The Human Genome Project produced a nearly comprehensive list of human genes and their "addresses" on chromosomes. But it has been harder to locate the instructions or match them to the genes they regulate. The strands are too thin to track their movements directly, even using the most powerful light microscopes. Another problem is the huge amount of DNA in the nucleus – imagine examining a tangle of yarn the size of the Earth in hopes of observing an encounter between two individual strands.

Biochemical methods “fix” DNA strands that touch each other

Recent years have seen the development of clever biochemical approaches to detect such events. Some methods "fix" DNA in place and then use enzymes to cut away loose regions that aren't attached to anything. This leaves the fragments that are connected – including genes and regulatory regions. They can now be sequenced and located in the map of the genome, allowing scientists to fill in some blank spaces on the map.

But Ana says these methods don't work well with types of cells that are rare, or with every sort of tissue, so they miss events that play a crucial role in human health. Another drawback is that generally only two strands at a time are captured. In some cases several regions are involved in activating a gene, so some of the partners escape.

A statistical approach to solve the puzzle

Several years ago, while working at MRC Clinical Sciences Centre in the UK, recently rebranded MRC London Institute of Medical Sciences (LMS), Ana and Paul Edwards from the University of Cambridge conceived a much different approach to the problem. She calls it Genome Architecture Mapping, or GAM. Developing the method required overcoming some major technical obstacles, solving an elaborate statistical puzzle, and finding ways to check their results.

The approach is something like scanning photographs to find relationships between people. In place of facial recognition software, Ana's lab uses sequencing to identify DNA. Instead of photographs of crowds, the group flash freezes the nuclei of cells and cuts them into thin, disc-like cross sections. At that point the basic principle becomes the same: two strands that meet to activate genes are somewhat more likely to appear in the same slice than "strangers".

Relative distance between genes. Top: Diagram with red coding for sequences that are close by, blue coding sequences further apart. Gene switch (enhancer, E) is close to gene Y but not gene X. Bottom: Position in the nucleus. Credit: Pombo Lab, MDC

The first step in finding matches is to look at lots of slices and identify all the sequences, then to apply math and statistics to distinguish real relationships from chance encounters. Ana brought this problem to the attention of Mario Nicodemi, at the University of Naples; Mario has been an Einstein BIH Visiting Professor at the MDC since 2017. Mario devised a statistical method to solve one of the key issues: to determine the probability that any two particular sequences might appear in the same slice by chance. If you knew that, you'd know how many "photos" you'd need to examine to detect true partnerships. Antonio Scialdone, a former PhD student of Mario’s and joint first author of the study, says that the "magic number" starts from a minimum of around 300, and the information gets more detailed as more “photos” are collected.

The first application came up with a list of disease-related candidates

Their first study applied GAM on mouse embryonic stem cells and produced a list of links between genes and other regions – a lot of evidence, all of it circumstantial. Did the pairings really say anything about how cells controlled the activity of genes? Finding out required trawling through databases and the scientific literature, in search of reports of connections that had already been confirmed in other types of experiments. The results: the method works, Ana says. GAM confirmed known relationships between regulatory regions and particular genes while exposing many new ones.

Those need to be confirmed through additional experiments, but the initial results are exciting. For one thing, they shed light on many genes whose activity is disturbed in some very serious diseases. In some cases, the problem lies within the sequence of a gene, but defects in regulatory regions can be equally dangerous. The evolutionary relationship between mice and humans means that many of the regulatory and gene sequences found in the study have human counterparts. So the new data provides a long list of new suspects that can now be scrutinized by researchers, and Ana's group has already begun.

The new method opens up new possibilities

Demonstrating that GAM works is just the beginning. More of the new matches between regulators and genes need to be verified by other types of experiments, which will help expose their roles in cells. The study needs to be expanded to new types of cells – particularly those affected by particular diseases. "Often the types of cells affected by diseases are very specific," says Robert Beagrie, a former PhD student of Ana’s and first author in the publication. "In their natural positions in human tissue, the particular cells affected by a disease are normally mixed up with many other cell types that are less affected. GAM should allow us to target the affected cells very specifically, so that we can identify the regulatory regions most important for our disease of interest."

GAM fills in a lot of blank spaces in our knowledge map of genome functions, opening new territories to explore and survey. Ultimately Ana hopes that the method will provide a list of the regulatory sequences in the genome that is as complete as our catalog of genes. This information is an open invitation to pursue even more profound questions: The bringing together and separation of strands of DNA has to be coordinated across the entire genome. What mechanisms are responsible for such enormous feats of cellular management, and how do they communicate with each other? Answering those questions will represent a giant leap forward for biomedical research.

Accompanying Press Release

A three-dimensional map of the genome

Cells face a daunting task. They have to neatly pack a several meter-long thread of genetic material into a nucleus that measures only five micrometers across. This origami creates spatial interactions between genes and their switches, which can affect human health and disease. Now, an international team of scientists has devised a powerful new technique that ‘maps’ this three-dimensional geography of the entire genome. Their paper is published in Nature. Read on ...


Robert A. Beagrie1,2,3, Antonio Scialdone4, Markus Schueler1, Dorothee C. A. Kraemer1, Mita Chotalia2, Sheila Q. Xie2, Mariano Barbieri1,5, Inês de Santiago2, Liron-Mark Lavitas1,2, Miguel R. Branco2, James Fraser6, Josée Dostie6, Laurence Game7, Niall Dillon3, Paul A. W. Edwards8, Mario Nicodemi4 & Ana Pombo1,2,5,9 (2017): “Complex multi-enhancer contacts captured by Genome Architecture Mapping (GAM).” Nature. doi:10.1038/nature21411

1Epigenetic Regulation and Chromatin Architecture Group, Berlin Institute for Medical Systems Biology, Max-Delbrück Centre for Molecular Medicine in the Helmholtz Association, Berlin, Germany. 2Genome Function Group, MRC London Institute of Medical Sciences (previously MRC Clinical Sciences Centre), Imperial College London, London, UK. 3Gene Regulation and Chromatin Group, MRC London Institute of Medical Sciences (previously MRC Clinical Sciences Centre), Imperial College London, London, UK. 4Dipartimento di Fisica, Università di Napoli Federico II, and INFN Napoli, Complesso Universitario di Monte Sant’Angelo, Naples, Italy. 5Berlin Institute of Health (BIH), Berlin, Germany. 6Department of Biochemistry and Goodman Cancer Research Centre, McGill University, Montréal, Québec, Canada. 7Genomics Laboratory, MRC London Institute of Medical Sciences (previously MRC Clinical Sciences Centre), Imperial College London, London, UK. 8Hutchison/MRC Research Centre and Department of Pathology, University of Cambridge, Cambridge, UK. 9Institute for Biology, Humboldt-Universität zu Berlin, Berlin, Germany.

Robert A. Beagrie and Antonio Scialdone contributed equally to this work. Mario Nicodemi and Ana Pombo jointly supervised this work.

Featured image: public domain, Pixabay