PhD Research Fellow Position in Machine Learning and Software Engineering Predictive Health Modelling of Evolving Software Ecosystems University of Mons - Departement of Computer Science - Belgium  Based on recent developments in machine learning and open source software ecosystem analysis, an innovative and ambitious 5-year research project will start in 2021. It aims to develop prediction, simulation and discovery models to analyse and predict the health of OSS ecosystems and their constituent software components. The project is lead by Tom Mens (Full Professor, Software Engineering Lab) and Souhaib Ben Taieb (Associate Professor, Big Data and Machine Learning Lab), both renowned experts in their respective research domains, working at the Department of Computer Science of the University of Mons. Belgium is centrally located in Europe, and the labs are well-connected to other research teams worldwide. Our group hosts researchers of various nationalities, making a research position in our group an ideal stepping stone for an independent research career in academia or industry. Open PhD Research Fellowship We have open 4-year PhD positions on this project. Qualified candidates should hold a Master's degree or equivalent in computer science or related domains, with a background in machine learning and/or software engineering. A good knowledge of statistics and former experience in data analysis and open source software development are highly recommended. Candidates should be proficient in English and have good oral and written communication skills. Interested applicants should contact the principal investigators by e-mail at tom.mens@umons.ac.be and souhaib.bentaieb@umons.ac.be. Official applications should be submitted at your earliest convenience and should contain at least:
Project summary Open Source Software (OSS) is indispensable in today's software-driven society and industry. OSS communities manage and evolve ecosystems containing millions of interconnected software components released and maintained by thousands of geographically distributed contributors. Software ecosystems face a wide range of health issues induced by bugs, security vulnerabilities, incompatible component updates, and unmaintained, deprecated or outdated component releases. Because of the highly connected and inherently collaborative nature of the socio-technical networks of software ecosystems, these issues frequently impact (transitively) related components, resulting in a combination of fine-grained (component-level) and coarse-grained (network-level) health problems. This raises the need for efficient software health prediction models and techniques addressing OSS ecosystem health, at the level of individual components as well as at the socio-technical network level. To address this need, we will extract and combine fine-grained events related to the development of individual software components (e.g., new releases, new source code commits, code reviews, reported bugs and their associated fixes, message exchanges between developers), and coarse-grained events related to the evolving socio-technical network (e.g. versions, dependency constraints, new or abandoning contributors). This event data will be gathered from various sources: software distribution managers, version control systems, bug and issue trackers, and online communication channels. Modelling such data is particularly challenging notably due to the complex temporal dynamics, as well as the heterogeneity, quality, size and complexity of the date. We will develop and apply machine learning models for prediction and causal discovery of software health problems based on temporal point processes and dynamic network modelling techniques to analyse large-scale, multi-granular and evolving software ecosystem data. Based on recent developments in machine learning, we will develop prediction, simulation and discovery models to analyse and predict the health of OSS ecosystems and their constituent components. Main objectives Historical events of software development activity will be modelled using multi-dimensional point processes. These processes allow to model the inherent property of software development data where past development events can have an important influence on future events affecting (in a negative or positive way) the health of a software component. We will consider state-of-the-art point process models based on deep neural networks to capture more diverse and more complex influences of past events on future events. In addition to modelling the intrinsic temporal structure at the level of individual components, we will use dynamic network modelling to capture the temporally evolving socio-technical network of the ecosystem. Causal graph learning techniques will be used to infer the causal effects of events and associated health metrics across ecosystem components. Multi-level models will be conceived to combine component-level and network-level health prediction models by capturing the complex dynamic interplay between different levels of granularity and different time scales. To do so, dynamic graph representation learning techniques will be combined with flexible neural models for temporal point processes. Efficient learning techniques will be designed to estimate the parameters of the neural models based on data extracted from selected OSS ecosystems. The resulting machine learning algorithms and models will be used to provide practical predictive and discovery models for health analysis of OSS ecosystems, taking into account the diversity in activities, granularities and temporal scales, as well as the socio-technical aspects. They will be used to analyse and predict health issues in upcoming component releases, and to assess the network-level health impact by predicting how events with a (positive or negative) effect on health affect other ecosystem components.
|
Apply