Big Data at UGent

Big data represents a new computing paradigm, which stands for decentralized data storage combined with decentralized processing. It is one of the ways to cope with increasing data volume, variety, and velocity (Big Data 3 V's).

UGent/Klarrio Big Data Streaming Analytics Benchmark

If you are dealing with big data, high throughput, low latency, adaptive online algorithms, streaming analytics is the way to go. In recent years, a lot of R&D was put into the development of so-called streaming frameworks. It should be noted that a streaming context, in which incoming streams are aggregated, is totally different from online analytics. The latter type is performed on streaming data that has been enriched by static data. Additionally, streaming data types can be very different from case to case and little is known of how different frameworks react to different types of data.

Although the industry acknowledges the power of streaming analytics, one struggles to decide which framework will suit their needs and subsequently solve their problem in an optimal way.

Therefore, the UGent Big Data Analytics Team, in close collaboration with Klarrio, is developing a streaming analytics benchmark.

The included frameworks are Apache Spark, Apache Flink, Apache Storm (Trident) and Kafka Streams.

The first phase of the benchmark will focus on measuring throughput and latency for a basic streaming job comprising the following steps: 1. Ingest: Read event data from Kafka. 2. Basic transformations: Parse the data. 3. Joins: Join datastreams across topics together. 4. Aggregation: Compute aggregated metrics for each measurement point. 5. Window operations: Evolution of the metrics over specified look-back periods.

A second phase of the project will include data augmentation by enriching streaming data with static data. The third phase will extend the first and second phase by implementing analytical models for predictive and prescriptive analytics.

The benchmark will use open data from traffic information of the Netherlands and will be open-sourced.

For further information or if you have any questions please do not hesitate to contact us: Giselle.VanDongen AT UGent.be

Big Data Events

We are proud to announce that our Big Data team is again represented at the Apache Big Data conference on May 16-18, 2017 in Miami, FL. The talk is by Dirk Van den Poel ("Big Data Analytics Using (Py)Spark For Analyzing IPO Tweets.") Last year, we had three talks at the Apache Big Data event on May 9-12, 2016 in Vancouver, Canada. The three talks were by Bram Steurtewagen ("Data Science Applied: A Utilities Sector Case Study"), by Tijl Carpels ("On the fly retraining of predictive analytical models using Spark Streaming: An equity-price direction prediction case study."), and by Dirk Van den Poel ("Spark Big Data Analytics for Business, Finance and Marketing.").

Several UGent professors in Big Data offer a training program. Click here (in Dutch) for more information. After two very successful editions, we decided to intensify our efforts in 2017.

Big Data Publications

Most recent publications in the field of Big Data:

VERCAMER D., STEURTEWAGEN B., VAN DEN POEL D. & VERMEULEN F. (2017), Predicting Consumer Load Profiles Using Commercial and Open Data, IEEE Transactions on Power Systems, 31 (5).

VAN DEN POEL D., CHESTERMAN C., KOPPEN M. & BALLINGS M. (2016), Equity Price-Direction Prediction For Day Trading: Ensemble Classification Using Technical Analysis Indicators With Interaction Effects, IEEE WCCI Proceedings of the IJCNN Conference.

Big Data Projects

We have a strong cooperation with Klarrio, the leading Big Data IoT and Analytics Co. in the Benelux.

Starting Jan. 2016, we partner with the insurance company Corona Direct for a large-scale IoT Usage-Based Insurance research project.

Blog posts about some recent Big Data projects: e.g. Total Refineries asked us to apply industrial analytics to an IoT (Internet of Things) case. The team compared two open-source analytics environments (R versus Python + Spark) for the task at hand (unfortunately all other details are confidential).

Since Sept. 2013, we teach Apache Hadoop/HBase/Hive/Spark in our two state-of-the-art master degress: Master of Science in Marketing Analysis and Master of Science in Business Engineering: Data Analytics

Since Sept. 2013, we are actively involved in several research projects to use Big Data technology for Analytics.

Past Conference Participations

Blog entries related to Big Data:

  • ACM KDD 2017 in Halifax, NS (Canada)
  • Apache Big Data North America 2017 in Miami, FL
  • INFORMS Business Analytics 2017 in Las Vegas, NV
  • Spark Summit East 2017 in Boston, MA
  • IBM Spark Technology Center Meeting Feb. 2017 in Boston, MA
  • Student Presentations in Big Data class 2016 in Ghent, Belgium
  • IEEE Big Data 2016 in Washington, DC
  • AMPLab End of Project Event in Berkeley, CA
  • INFORMS Annual Meeting 2016 in Nashville, TN
  • Spark Summit Europe 2016 in Brussels, Belgium
  • ACM KDD 2016 in San Francisco, CA
  • IEEE WCCI 2016 in Vancouver, Canada
  • Apache Big Data 2016 in Vancouver, Canada
  • INFORMS Business Analytics 2016 in Orlando, FL
  • Spark Summit East 2016 in New York City, NY
  • FOSDEM 2016 in Brussels, Belgium
  • UC Berkeley's AMPLab Winter Retreat in Lake Tahoe, CA
  • NIPS 2015 in Montreal, Canada
  • SC15 Supercomputing Conference in Austin, TX
  • Informs 2015 Annual Meeting in Philadelphia, PA
  • Informs 2014 Annual Meeting in San Francisco
  • ACM KDD2014 in New York City
  • MSI 2014 Conference on Marketing in Data-Rich Environments in San Francisco, CA
  • INFORMS Big Data Conference in San Jose, CA
  • ASE 2014 Big Data Conference at Stanford University
  • VOSEKO Alumni lecture on Big Data/IoT ...
  • INFORMS 2014 Business Analytics and Big Data Conference in Boston, MA
  • Agoria Data-Driven Innovation in Brussels, Belgium
  • IEEE ICDM Conference in Dallas, TX
  • Sogeti BI Symposium in Amsterdam, The Netherlands
  • SC13 Supercomputing in Denver, CO
  • Agoria BigData Opening event in Brussels, Belgium + The Data-Driven Bank
  • INMA 2013 in Berlin, Germany
  • DMA 2013 in Chicago, IL
  • INFORMS Annual Meeting 2013 in Minneapolis, MN
  • KDD 2013 in Chicago, IL
  • OSCON 2013 Open Source Convention in Portland, OR
  • Oracle 2013 Big Data Forum in Belgium
  • Strata + Hadoop New York City 2012 Big Data Conference
  • Sogeti Belux 2012 Conference on Big Data in Brussels
  • O'Reilly's Strata 2012 Big Data Conference in Santa Clara, CA
  •