Keynote Talks

rosa

Simplifying the development of complex workflows for distributed computing infrastructures

Distributed computing infrastructures are becoming every time more complex with environments that involve sensors, edge devices, instruments, etc, and, as well, high-end computing systems such as clouds and HPC clusters. A key aspect is how to describe and develop the applications to be executed in such platforms. Very often these applications are not standalone, but involve a set of sub-applications or steps composing a workflow. The trend in this type of workflows is that the components can be of different nature, combining computationally intensive and data analytics components, for example. The scientists rely on effective environments to describe their workflows and engines to manage them in complex infrastructures. COMPSs is a task-based programming model that enables the development of workflows that can be executed in parallel in distributed computing platforms. The workflows that are currently supported may involve different types of tasks, such as parallel simulations (MPI) or analytics (i.e., written in Python thanks to PyCOMPSs, the Python binding for COMPSs). COMPSs, through and storage interface, makes transparent the access to persistent data stored in key-value databases (Hecuba) or object-oriented distributed storage environments (dataClay). While COMPSs has been developed from its early times for distributed environments, we have been extending it to deal with more challenging environments, with edge devices and components in the fog, that can appear and disappear. Examples of new features that are considered in these environments are task failure management and input/output data from streams. The talk will present an overview of the challenges on workflows’ development in the mentioned environments and an overview of how it can be tackled with COMPSs.

herve.goeau_

Recognizing the world’s flora: dream or reality?

Plant diversity is one of the major elements in the development of human civilization and plays a crucial role in the functioning and stability of ecosystems. However, our knowledge of plants is far from complete given the impressive number of nearly 400,000 species, knowing that more than 2,000 species are discovered each year. Over the past two decades, the biodiversity informatics community has made considerable efforts to develop global initiatives, digital platforms and tools to help biologists organize, share, visualize and analyze biodiversity data. However, the burden of systematic plant identification severely penalizes the aggregation of new data and knowledge at the species level. Plant experts spend a lot of time and energy identifying species when their expertise could be more useful in analyzing the data collected. Automated identification has made considerable progress, particularly in recent years, thanks in particular to the development of convolutional neural networks (CNNs), as evidenced by the long-term evaluation of automated plant identification organized as part of the LifeCLEF initiative. Nowadays, the best systems evaluated so far are able to compete with human experts and more and more plant identification applications are being developed such as Pl@ntNet or Seek (iNaturalist). Thanks to their rapidly growing audience, they provide an opportunity to monitor biodiversity on a large scale and to aggregate new specific knowledge. However, these applications face the problem of being either too limited to the flora of particular regions or limited to the most common species, while there are more and more species with a transcontinental range such as naturalized alien species and cultivated plants. Fragmentation of identification in regional floras is less and less a reliable approach, while focusing only on the most common species on the planet is obviously not a better idea in terms of biodiversity. In this talk, we will discuss the possibilities and limitations of deep learning and high performance computing to build a plant identification system working at the scale of the world’s flora.

10427294_10205790044134326_607969120147382742_n

Promoting Collaborations in HPC throughout Latin-America: The case of SDumont/SINAPAD

The Santos Dumont petaflopic supercomputer (SDumont) is the largest HPC facility in Latin America dedicated to science. It is housed at the National Laboratory for Scientific Computing (LNCC), in Petrópolis, State of Rio de Janeiro, Brazil. It is currently used by more than 900 users and 120 research projects, in more than 20 different research areas. In this talk we will briefly introduce the SDumont supercomputer and some of the research projects carried out on it. Next, we will discuss about possibilities of collaboration between Latin American researchers on the use of this supercomputer.

 

Invited Talks

Leonardo Bautista

Advanced I/O strategies developed by the Energy oriented Center of Excellence

EoCoE is the Energy oriented Centre of Excellence for computing applications. The primary goal of EoCoE is to create a new, long lasting and sustainable community around computational energy science. It aims to resolve current bottlenecks in application codes, leading to new modelling capabilities and scientific advances among the four user communities (Meteo, Materials, Water and Fusion,); it develops cutting-edge mathematical and numerical methods, and tools to foster the usage of Exascale computing. In this talk, we will hear about the advances the EoCoE project has done related to I/O and checkpointing scientific applications at large scale.

Ignacio Laguna

Performance-Driven Floating-Point Tuning for GPU Applications

We present a static and dynamic analysis framework to perform mixed-precision floating-point tuning on scientific GPU applications. While precision tuning techniques are available, they are designed for serial programs and are accuracy-driven, i.e., they consider configurations that satisfy accuracy constraints, but these configurations may degrade performance. Our framework (called GPUMixer), in contrast, presents a performance-driven approach for tuning floating-point programs. We demonstrate our approach on several GPU applications and show that it obtains performance improvements of up to 46.4% of the ideal speedup in comparison to only 20.7% found by state-of-the-art methods.

Paola Buitrago

New Systems, New Projects: PSC’s HPC, HPAI, and Data Journey Continues

Research is rapidly evolving to be even more data-centric. Artificial intelligence (AI) is driving this change, and it is increasingly enabling qualitative breakthroughs. For analyzing large-scale data, AI is helping to spot correlations and identify anomalies. For simulation and modeling, AI is reducing time to solution by orders of magnitude by replacing expensive computation with fast inferencing. To bring these capabilities to the research community, equally including fields that have traditionally used high-performance computing (HPC) and those that have not, the Pittsburgh Supercomputing Center (PSC) designs and operates advanced computing systems that bring together HPC, AI, and Big Data. This talk surveys PSC’s recent Bridges-AI and upcoming Bridges-2 systems that deliver converged computing at no charge for research and education, its Open Compass research program that centers on future AI technologies, and the Brain Image Library and Human BioMolecular Atlas Program (HuBMAP), which are creating unique, high-value datasets in the life sciences for the international community.

Daisuke Kihara

Computational protein 3D structure modeling from cryo-electron microscopy density maps

The significant progress of the cryo-electron microscopy (cryo-EM) poses a pressing need for software for structural interpretation of EM maps. Particularly, protein structure modeling tools are needed for EM maps of around 4 Å resolution or worse, where building a main-chain structure is challenging. Our group is developing software for protein structure modeling by applying various algorithms, including deep learning. In this talk, I introduce two of such software from our lab. We have developed a de novo modeling tool named MAINMAST (MAINchain Model trAcing from Spanning Tree) for EM maps (Nature Communications, 2018) of up to about 4 Å resolution. MAINMAST builds main-chain traces of a protein in an EM map from a minimum spanning tree constructed by connecting high-density points. MAINMAST showed better modeling performance than existing methods. The method is further enhanced recently to be able to model symmetric protein complexes and ligand (drug) molecules that bind to a protein in a map. Moreover, to provide structure information for maps determined at even lower resolution (5~10 Å), we have recently developed a new tool, Emap2sec, which uses convolutional deep neural network (CNN) for detecting structures of proteins (Nature Methods, 2019). Emap2sec scans an EM map with a 3D voxel and assigns a type of protein structure class, i.e. alpha helix, beta strand, or coil, from density patterns of the voxel and its neighbors.

Etienne Decencière

Deep learning and mathematical morphology for biomedical applications

In biomedical image segmentation, we often have a priori information on the objects of interest. This information can be difficult to use with deep learning methods, whereas mathematical morphology allows to model them in a natural manner. We propose several ways of combining both approaches in order to solve practical applications in retinal and histological image analysis.

Stefan Hoops

High-resolution computation modeling of immune responses in the gut

EoCoE is the Energy oriented Centre of Excellence for computing applications. The primary goal of EoCoE is to create a new, long lasting and sustainable community around computational energy science. It aims to resolve current bottlenecks in application codes, leading to new modelling capabilities and scientific advances among the four user communities (Meteo, Materials, Water and Fusion,); it develops cutting-edge mathematical and numerical methods, and tools to foster the usage of Exascale computing. In this talk, we will hear about the advances the EoCoE project has done related to I/O and checkpointing scientific applications at large scale.

Rodrigo Mora

Complex networks of miRNA-transcription factors mediating gene dosage compensation

Cancer is a group of diseases with cell growth out of control and has several cancer hallmarks, including 8 functional capabilities and 2 enabling characteristics. Among them, genome instability and mutation are the drivers of cancer evolution. Genomic instability leads to aneuploidy in most cancers. Aneuploid is lethal for normal cells but it is a hallmark of most advanced cancer cells, which must have developed mechanisms to deal with the negative effects of aneuploidy. Indeed, aneuploidy autocatalizes genomic instability leading to many unstable karyotypes and cell death. However, in rare occasions, a perfect combination of simultaneous alterations is met, overcoming the error thresholds in cancer evolution leading to malignant cells able to handle aneuploidy and genomic instability. This bottleneck in evolution represents a gate to evolve malignant karyotypes leading to drug-resistance and metastasis. There is evidence of the presence of this hidden karyotypic pattern implying that a stable mechanism does exist within the genomic instabilty of cancer to cope with the negative effects of aneuploidy, and this represents a huge challenge for pattern recognition. We postulated the hypothesis that the dosage compensation of several key genes represents a putative mechanism to deal with aneuploidy in cancer. Previous evidence support the existence of genes under dosage compensation. Therefore we explore this phenomenon in the genomic and transcriptomic data of the NCI60 panel and managed to identify a cluster of genes with low variation in expression despite a high variation in copy number. This genes are distributed along several chromosomal locations and we hypothesized that its dosage compensation can be mediated by a network of miRNAs and transcription factors since they form network motifs with systems level properties including incoherent feedforward circuits that have been reported to show adaptation to the amount of their genetic templates. Indeed, there are several miRNAs and transcription factors correlating with the copy number variation of the candidate compensated genes. Therefore, we built a biocomputational platform to generate complex network of those interactions and automatically construct ODE mathematical models of the NCI60 panel of cancer cell lines using mass action kinetics. After model fitting, we were able to reproduce dosage compensation for MYC and STAT3 and identify several candidate miRNAs and TF mediating this phenomenon. After steady state simulations and perturbation experiments, we were able to reduce the model complexity and identify a minimal model of gene dosage compensation for MYC and STAT3 involving 4 miRNAs with redundancy in their function of dosage compensation. The inhibition of those miRNAs using miRNA inhibitors in an experimental model of colon cancer cells led to the induction of cytotoxicity in the cells with high copy numbers of myc. This dosage compensation mechanism for myc and stat3 may represent a sentinel core for the simultaneous dosage compensation of many other genes and a novel target against aneuploid cancers. Future work is required to confirm the effect of gene dosage compensation on patient survival and to identify other compensation cores to direct personalized precision therapies against cancer.

Verónica Melesse Vergara

On the road to Exascale: Past, Present, and Future leadership-class systems at the OLCF

For over 25 years, the Oak Ridge Leadership Computing Facility (OLCF) has been a leader in high performance computing and has provided leadership-class systems to researchers around the world. This talk will provide a brief history of computing at the OLCF and its newest supercomputer, Summit, currently ranked number one in the TOP500 list. The different user programs that allow various scientific projects to benefit from the large-scale computing resources available through the OLCF will also be discussed.

Industry Talks

image008

Current challenges and Trends in the design of HPC systems

The talk presents the current challenges and trends in HPC systems. Aspects such as in-memory computing, Big Data and Artificial Intelligence, and the cooling constraints greatly affect. But the radical effect is caused by the new paradigm of the quantum computing.

joaquim

Dell EMC HPC Solutions Portfolio. Making innovation real with the convergence of HPC and AI

fabio

The Journey of Memory-Driven Computing

JaimePuente_photo

The Convergence of HPC and AI: How Lenovo is addressing this trend

It is becoming clear that the supercomputing field is taking the next giant step in its evolutionary path. Researchers and commercial enterprises have been increasingly applying high performance computing (HPC) and artificial intelligence (AI) for gaining scientific insights and adding value to their organizations. To support these areas, Lenovo has been driving research projects with universities and proof-of-concepts for enterprise AI solutions with ISVs using popular ML/DL frameworks coupled with HPC to show how the overall workflow can be optimized for performance and improved insight. The work of training a deep learning model requires large data and demands computing and networking capabilities similar to HPC workloads. In this talk, I will present how Lenovo is approaching this convergence of HPC and AI using its own HPC/AI technology and showcasing some high-impact research and enterprise projects as examples. Finally, I will review the opportunities and challenges of AI in the HPC landscape.

Copyright © Carla 2019