Recent Publications [2010 onwards]

Gabriel Iuhasz, Daniel Pop, Ioan Dragan – Architecture of a Scalable Platform for Monitoring Multiple Big Data Frameworks, in Scalable Computing: Practice and Experience, Vol 17, No 4, 2016
Abstract: Latest advances in information technology and the widespread growth in different areas are producing large amounts of data. Consequently, in the past decade a large number of distributed platforms for storing and processing large datasets have been proposed. Whether in development or in production, monitoring applications running on these platforms is not an easy task and dedicated tools and platforms were proposed for this scope. In this paper we present a distributed, scalable, highly available platform able to collect, store, query and process monitoring data obtained from multiple Big Data frameworks. We present its architecture and initial results obtained.

Daniel Pop, Gabriel Iuhasz, Dana Petcu – Distributed Platforms and Cloud Services: Enabling Machine Learning for Big Data, in Data Science and Big Data Computing. Frameworks and Methodologies, by Mahmood, Zaigham (Ed.), Springer, 2016
Abstract: Applying popular machine learning algorithms to large amounts of data has raised new challenges for machine learning practitioners. Traditional libraries do not support properly the processing of huge data sets, so the new approaches are needed. Using modern distributed computing paradigms, such as MapReduce or in-memory processing, novel machine learning libraries have been developed. At the same time, the advance of cloud computing in the past 10 years could not be ignored by the machine learning community. Thus, a rise of cloud-based platforms has been of significance. This chapter aims at presenting an overview of novel platforms, libraries, and cloud services that can be used by data scientists to extract knowledge from unstructured and semi-structured, large data sets. The overview covers several popular packages to enable distributed computing in popular machine learning environments, distributed platforms for machine learning, and cloud services for machine learning, known as machine-learning-as-a-service approach. We also provide a number of recommendations for data scientists when considering machine learning approach for their problem.
Read initial manuscript…

Daniel Pop, Gabriel Iuhasz, Ciprian Craciun, Silviu Panica – Support Services for Applications Execution in Multi-Clouds Environments, 2016 IEEE International Conference on Autonomic Computing (ICAC), Self Organizing Self Managing Clouds Workshop (SOSeMC 2016), Wuerzburg, July 2016
Abstract: Deploying and running applications in multi-cloud environments is a challenging task for a number of reasons: different configuration parameters are needed for different cloud environments, application’s artefacts may vary across technologies or cloud providers, services provided at IaaS level vary from one cloud provider to another. This paper introduces a run-time platform that enables the deployment and execution of applications on multi-clouds with guaranteed quality of service (QoS), and details the underlying services of the unified layer responsible for connecting to multiple IaaS cloud providers, which avoids runtime lock-in and simplifies the management of cloud applications.
Read initial manuscript…

Dana Petcu, Gabriel Iuhasz, Daniel Pop, Domenico Talia, Jesus Carretero, Radu Prodan, Thomas Fahringer, Ivan Grasso, Ramon Doallo, Maria J. Martin, Basilio B. Fraguela, Roman Trobec, Matjaz Depolli, Francisco Almeida Rodriguez, Francisco de Sande, Georges Da Costa, Jean-Marc Pierson, Stergios Anastasiadis, Aristides Bartzokas, Christos Lolis, Pedro Goncalves, Fabrice Brito, Nick Brown – On Processing Extreme Data, in Scalable Computing: Practice and Experience, Vol 16, No 4, 2015
Abstract: Extreme Data is an incarnation of Big Data concept distinguished by the massive amounts of data that must be queried, communicated and analyzed in near real-time by using a very large number of memory or storage elements and exascale computing systems. Immediate examples are the scientific data produced at a rate of hundreds of gigabits-per-second that must be stored, filtered and analyzed, the millions of images per day that must be analyzed in parallel, the one billion of social data posts queried in real-time on an in-memory components database. Traditional disks or commercial storage nowadays cannot handle the extreme scale of such application data. Following the need of improvement of current concepts and technologies, we focus in this paper on the needs of data intensive applications running on systems composed of up to millions of computing elements (exascale systems). We propose in this paper a methodology to advance the state-of-the-art. The starting point is the definition of new programming paradigms, APIs, runtime tools and methodologies for expressing data-intensive tasks on exascale systems. This will pave the way for the exploitation of massive parallelism over a simplified model of the system architecture, thus promoting high performance and efficiency, offering powerful operations and mechanisms for processing extreme data sources at high speed and/or real time.

Jesus Carretero, Salvatore Distefano, Dana Petcu, Daniel Pop, Thomas Rauber, Gudula Rünger, David E. Singh – Energy-efficient Algorithms for Ultrascale Systems, in Journal of Supercomputing Frontiers and Innovations, 2015
Abstract: The chances to reach Exascale or Ultrascale Computing are strongly connected with the problem of the energy consumption for processing applications. For physical as well as economical reasons, the energy consumption has to be reduced significantly to make Ultrascale Computing possible. The research efforts towards energy-saving mechanisms of the hardware has already led to energy-aware hardware systems available today. However, hardware mechanisms can only obtain an energy reduction if software can exploit them such that energy-efficient computing actually results. In the software area, there also exists a multitude of research approaches towards energy saving. These research approaches and results are often isolated either on the system software level or the application organization level, reflecting the expertise of the corresponding research group. The challenge of reducing the energy consumption dramatically to make Ultrascale Computing possible are so ambitions that a concerted action combining all these software levels and research efforts seems reasonable. In this article, we demonstrate the current research efforts and results related to energy in the diverse areas of software. Moreover, we conclude with open problems and questions concerning energy-related techniques with an emphasis on the application algorithmic side.

Daniel Pop, Alejandro Echeverria, Dana Petcu, Gloria Conesa – Enabling Open and Collaborative Public Service Advertising through Cloud Technologies, in Zaigham Mahmood (Ed.) “Cloud Computing Technologies for Connected Government”, IGI Global, 2015
Abstract: The importance of cloud computing and its benefits for public sector has already been recognised by national and supranational organisations. Meanwhile, public service advertising is seen as a powerful tool in the hands of public administrations in raising awareness and changing public behaviour towards a social issue. After introducing the main concepts of cloud computing, this chapter describes an interactive, cloud-enabled platform for public service advertising. During the validation phase, which involved seven European case studies, we learnt not only the benefits for both data producers and consumers coming with the platform, but also helped us identifying the gap between these two sides. In order to bridge this gap we propose a novel, open and collaborative platform for public advertising based on semantic Web technologies for service discovery and message delivery. Enabling technologies of the platform are next identified and, finally, the deployment on hybrid cloud environments is discussed.

Daniel Pop, Dana Petcu, Marian Neagul – Long Term Digital Preservation Using Cloud Services, 7th RO-LCG Conference, Bucharest-Magurele, 2014
Abstract: Cloud computing is adopted nowadays by various activity sectors. Libraries implementing Cloud-based solutions for their digital preservation environment can also benefit from the advantages offered by combining private Cloud deployments with public Clouds usage. This paper particularly addresses the deployment of digital preservation solutions over Multi-Cloud environments. After discussing different Cloud deployment strategies, we are presenting the overall architecture of a digital preservation environment and how its components can be assigned to different Clouds.

Daniel Pop, Alejandro Echeverria, Juan Vicente Vidagany – Integrating Social Media and Open Data in a Cloud-based Platform for Public Sector Advertising, Web Information Systems Engineering (WISE 2014) Workshops, Thessaloniki, 2014
Abstract: Nowadays, Public Sector Advertising (PSA) is conveyed as unidirectional top-down stream of messages that clearly separates the content producers (governments usually) from content consumers (citizens). As social networks and Linked Open Government Data (LOGD) initiatives are moving forward e-Government towards connected government, PSA platforms need to embrace the modern paradigms of empowering citizens and communities to increasingly and actively participate in functioning of the society for their own benefits. In this position paper, firstly we present our findings related to the use of content from social networks as public ads and secondly, we propose an open and collaborative platform that supports semantically-enabled, participative PSA.

Daniel Pop, Marian Neagul, Dana Petcu – On Cloud deployment of digital preservation environments, 2014 IEEE/ACM Joint Conference on Digital Libraries (JCDL), London, 2014
Abstract: Although migrating library applications to Cloud environment is not an easy task, many libraries are interested in using Cloud infrastructure services broadly across their businesses, whether is about a Public, Private or Hybrid Cloud. One of the migration expectations is the scalability of digital preservation architectures in Cloud environments. In this paper we address the scalability and portability of storage and compute platforms, which combine storage of large datasets and their processing. Concretely, we propose a toolkit developed using Puppet configuration management system that facilitate the deployment of complex digital preservation platforms over heterogeneous Cloud environments and we present, as a use case, its integration with SCAPE platform.

Daniel Pop – The Treasure of Public Sector Information, 1st Share-PSI 2.0 Workshop: Uses of Open Data Within Government for Innovation and Efficiency, Samos, 2014
Abstract: The treasure waiting to be discovered here is the big investments in Public Sector Information made by Governments around Europe in past decade that it is still “hidden” due to under usage by its intended audience, the citizens. How can we unleash this hidden treasure? How can we increase the visibility of existing local, regional, national, European stocks of public sector information (PSI) to boost citizen-centric e-Government? How much will cost Public Administrations (PA) digging out this treasure? The aim of SEED (Speeding Every European Digital) solution is to boost “citizen-centric” e-Government services, to reuse as much as possible the European, national, regional and local stocks of PSI and to leverage saving costs of e-Government and e-Governance deployments through a cloud computing approach and a very cheap network of interactive PSA nodes. SEED is making mash-ups of e-Government contents for raising awareness of citizens about e-Government services available across all Europe. It is about transforming PSI in interactive advertisement messages. The paper describes the SEED platform and the technological platform that powers it, highlights the main concepts and presents the initial findings after more than one year of field deployment of seven pilots within six EU countries.

Daniel Pop, Gloria Conesa, Alejandro Echeverria – Towards Automated Discovery and Composition in Public Service Advertising, 14th European Conference on eGovernment (ECEG), Brasov, 2014
Abstract: Public Service Advertising (PSA) is still predominantly built as a unidirectional
top-down stream of messages without empowering citizens and communities to participate and enhance their own and collective benefits, without extending transparency and openness, without personalising services for individual users. A New cloud-based framework for innovation in PSA that will follow a semantic strategy, taking into account the different challenges in a multi-domain and multilingual context, aiming to: (i) unlock the positive network effect of Linked Open Government Data (LOGD) by boosting the automated discovery and composition of services for PSA, and (ii) enhance PSA effectiveness and increase citizens engagement through service personalisation and rely on feedback measurement to improve the quality of the service in a loop process.

Daniel Pop, Vasiliki Moumtzi, Josefina Farinos – The good, the bad and the beauty of advertisement for public sector services, 14th European Conference on eGovernment (ECEG), Brasov, 2014
Abstract: How can we reuse existing local, regional, national and European stocks of public sector information (PSI) to boost citizen-centric e-Government? How can we save costs on e-Governance deployments by reusing existing infrastructure of public service advertising (PSA) networks? How can recent technological developments in cloud computing remove the burden of complex IT&C setups from public administrations? These are challenges faced by governments all over the world, specifically in Europe, where political and economic diversity increase the complexity of the environment. In this paper we will present a cloud-based solution that reuses existing PSI making mash-ups of e-Government contents for raising citizens’ awareness on e-Government services available across all Europe. It boosts “citizen-centric” e-Government services by reusing European, national, regional and local stocks of PSI and it enables costs saving of e-Government and e-Governance deployments through a very cheap network of interactive PSA nodes.

Daniel Pop, Caius Bogdanescu – Ontology-based Recommender for Distributed Machine Learning Environment, ACSys Workshop @ 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, Timisoara, 2013
Abstract: Domain experts in different areas have a large number of options for approaching their specific data analysis problem. In exploration of large data sets on HPC systems, choosing which method to use, or how to tune the parameters of an algorithm to achieve good results are challenging tasks for data analysts themselves. In this paper, we propose a recommendation module for a distributed machine learning environment aiming at helping the end-users to obtain optimized results for their data analysis / machine learning problem.


Daniel Pop – Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions, Technical Report of Institute e-Austria, Timisoara, 2012

Abstract: Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. Traditional ML libraries does not support well processing of huge datasets, so that new approaches were needed. Parallelization using modern parallel computing frameworks, such as MapReduce, CUDA, or Dryad gained in popularity and acceptance, resulting in new ML libraries developed on top of these frameworks. We will brie y introduce the most prominent industrial and academic outcomes, such as Apache Mahout, GraphLab or Jubatus.We will investigate how cloud computing paradigm impacted the eld of ML. First direction is of popular statistics tools and libraries (R system, Python) deployed in the cloud. A second line of products is augmenting existing tools with plugins that allow users to create a Hadoop cluster in the cloud and run jobs on it. Next on the list are libraries of distributed implementations for ML algorithms, and on-premise deployments of complex systems for data analytics and data mining. Last approach on the radar of this survey is ML as Software-as-a-Service, several BigData start-ups (and large companies as well) already opening their solutions to the market.

Daniel Pop, Gabriel Iuhasz – Overview of Machine Learning Tools and Libraries, Technical Report of Institute e-Austria, Timisoara, 2011
Abstract: Over the last three decades many general-purpose machine learning frameworks and libraries emerged from both academia and industry. The aim of this overview is to survey the market of ML tools and libraries and to compare them in terms of features and supported algorithms. As there is a large number of solutions available that offers a large spectrum of features, we start by introducing a set of criteria, grouped in four categories, for both pruning and comparing the candidates. Based on these criteria, we will synthetically present the results and we discuss the findings in each category.

Leave a comment