Big Data and Data Science – 27th meetup

Our 27th meetup as a joint venture with DataScience.be was a huge succes! The goal was to give a thorough introduction of Big Data to the data scientists and business people of both organizations.

In total, 221 participants registered over both communities! Unfortunatley, quite a lot of people did not make it last night. That is probably due to the EU top that was happening  yesterday. (Note that the meetup was held at the VUB in Elsene.) But it was still a huge crowd.

Presentations on Big Data for Data Science

Philippe Van Impe, co-organizer of DataScience.be, gave an overview of last years activities of the DataScience.be community. He focussed specifically on their data for good and hackaton initiatives. In the presentation, he hid a product placement for BigBoards: on one of the pictures from a hackaton, Kris Peeter’s Hex was visible in the foreground. The Hex was used to do social network analysis!

Next, the DataScience.be’s team who have been working on their Médecins Sans Frontières (MSF), presented an overview of their work and results. The team was lead by Edward Vanden Berghe. They received a dataset from MSF on the organisation’s donations. The team screened the dataset for donator segmentations and looked for actionable insights to help MSF improve their revenues.

As 3rd speaker, I gave an introduction to Big Data and what it can mean to organisations, large and small. Finally, I touched on the importance of data science to give meaning to the data.

Daan Gerits took over and got into the details of how to setup a scalable and resilient Big Data architecture.

After the break, Ferdinand Casier en Mathias Verbeke exposed their EluciDATA project which starts in 2015. The goals is to help Belgian companies with data innovation. Any questions or request for participation can be send to info@elucidata.be!

And last but not least, Karim Douïeb explained how they are using Spark for call record details analysis for mobile operators. Really interesting!

The meetup ended at about 21h30 with a Q&A session with all presenters together. Very thoughtful questions were raised by a sharp audience!

Thank you all for participating!!!

Images from the 27th meetup

O’Reilly Strata Europe 2014 – impressions from a PhD student

Guest post by Vasia Kalavri

I had marked the dates for the Strata + Hadoop World conference since I found out I was coming to Barcelona for an internship, about 5 months ago. Yet, as a PhD student, I knew I had no chance of finding a way to pay such a crazy registration fee… unless, I would constrain my diet to rice and water; a -realistically speaking- impossible goal when living in Barcelona, surrounded by such good food.

Then, last Monday morning, while attending the tutorial sessions at papis.io, I received an e-mail with the following content: “1 FREE 2-day pass to enter the conference on Thursday and Friday! Net value €779,00!!!” and -oh my god- it was not a lie! It was a message from the Belgian Big Data community, offering a free 2-day pass to Strata, to the first member that would send a reply to this e-mail, after 14h00 CET! I immediately looked at my watch: 12h24. I quickly made a draft reply and waited patiently for time to go by. At 13h59, I opened the draft reply, waited for the last minute to pass and pressed the “send” button, hoping that my mobile data connection won’t give up on me. And… bingo! A few minutes later, I was informed that I was indeed the first one to reply. My message had arrived at 14h00s06 :))

In the remaining of this post, I will provide my personal short and biased summary of the event.

General impressions

My first thought when entering the main room on Thursday morning was “wow this is huge”. I’ve been to several -mostly academic- conferences, but this one had both the largest amount of attendees and the most fascinating venue. And, by all means, it looked nothing like an academic conference. At least in the beginning, it was more of a show: fancy lighting, music to introduce the keynote speakers, text-free slides! To be honest, for a moment I thought this would all be a huge marketing campaign and I would waste my time. In the end, I have to admit that I was very happily surprised by the technical level of the talks and by the things I learned. As a systems person, I don’t often get to attend events that focus on use-cases and applications. Getting to hear about real-world use-cases was very inspiring for me!

Favorite Talks

I tried to avoid the business and industry tracks and mostly attended the hadoop, tools and data science tracks. Among the keynotes, I especially enjoyed Geoff McGrath and Camille Fournier on Thursday and Jordan Tigani on Friday. I would definitely suggest watching them: https://www.youtube.com/playlist?list=PL055Epbe6d5Y8aARKdXVVtJnEttlhsRyf.

Among the rest of the talks, my favorites were:

  • SAMOA: A Platform for Mining Big Data Streams”, by Gianmarco De Francisci Morales
  • High-Level Abstractions Make Big Data Useful for Real People” (even though the talk content was quite different than what the title suggests), by Melissa Santos
  • How Search Can Save Your Hadoop Investment and More”, by Shay Banon
  • Realtime Data Analysis Patterns”, by Mikio Braun
  • RT-Giraph: Online graph Mining Simplified”, by Georgios Siganos.

Most of the slides are already available here: http://strataconf.com/strataeu2014/public/schedule/proceedings

Misc and +1’s

  • I was really happy to see so many great female speakers! Keep it up organizers!
  • I got a couple of really nice T-shirts, +1 to Cloudera for providing female sizes!
  • +1 for the food and the great sea view of the banquet room.
  • It turns out I was not the only one with a University affiliation. I met a fellow PhD student from ULB there :))

Overall, Strata was a very enjoyable experience for me. Who knows, I might even consider sending a talk next year!
Finally, I’m really grateful to BigData.be for the free pass and, of course, to my mobile operator for delivering my e-mail with such great precision!

Short Bio

Vasia Kalavri is a PhD student at KTH, Sweden and UCL, Belgium. She is currently doing an internship at Telefonica Research and lives in Barcelona, Spain. Vasia is working in the area of distributed data processing, systems optimization and large-scale graph analysis. She is a committer of Apache Flink (flink.incubator.apache.org) and also contributing to Grafos.ml (grafos.ml).

Website: http://web.ict.kth.se/~kalavri/
Twitter: @vkalavri

Strata 2014 – Claim your discount!

This year, Strata conference is going down from 19-21 November 2014 in Barcelona. Next to being a gorgeous city, the conference is another reason to visit for anyone with an interest for data! To give you an idea of what Strata is, I pulled a summary from the StrataConf website.

Moreover we got a discount code! Pull the link and code from the sponsors list on our meetup page!

About the O’Reilly Strata Conference

The best minds in data will gather in Barcelona this November for the O’Reilly Strata Conference to learn, connect, and explore the complex issues and exciting opportunities brought to business by big data, data science, and pervasive computing.

The future belongs to those who understand how to collect and use their data successfully. And that future happens at Strata.

Why You Should Attend

Strata Conference is where big data’s most influential business decision makers, strategists, architects, developers, and analysts gather to shape the future of their businesses and technologies. If you want to tap into the opportunity that big data presents, you want to be at Strata.

In a crowded market place of “Big Data” conferences, Strata has firmly established itself as the place where you go to meet people who think and do data science.

At Strata, you’ll:

  • Be among the first to understand how you can leverage the promise of this huge change, and survive the resulting disruption
  • Find new ways to leverage your data assets across industries and disciplines
  • Learn how to take big data from science project to real business application
  • Discover training, hiring, and career opportunities for data professionals
  • Meet-face-to face with other innovators and thought leaders

Experience Strata

Strata Conference delivers the nuts-and-bolts foundation for building a data-driven business—the latest on the skills, tools, and technologies you need to make data work—alongside the forward-looking insights and ahead-of-the-curve thinking O’Reilly is known for.

There was a palpable sense of excitement in the air. Obviously most of the attendees were already ‘data’ aficionados, but it’s clear that ‘data’ in various forms is on the radar for governments, large corporations, and the developer communities.

At Strata, you’ll find:

  • Three days of inspiring keynotes and intensely practical, information-rich sessions exploring the latest advances, case studies, and best practices
  • A sponsor pavilion with key players and latest technologies
  • A vibrant “hallway track” for attendees, speakers, journalists, and vendors to debate and discuss important issues
  • Plenty of events and opportunities to meet other business leaders, data professionals, designers, and developers

About O’Reilly

O’Reilly is followed by venture capitalists, business analysts, news pundits, tech journalists, and thought leaders because we have a knack for knowing what’s important now and what will be important next—and the ability to articulate the seminal narratives about emerging and game-changing technologies.

We don’t say this to brag. We say it to make a point: we’re not easily hypnotized by hype. We’ve seen the bubbles build and burst. For over three decades, we’ve been tapping into a deep network of alpha geeks and thought leaders to recognize the truly disruptive technologies amidst the fluff. So when we invest in a conference, we’re not just following the hype, we’re committed to creating a community around an issue we believe is transformative.

At O’Reilly, we think big data is not just important. We think it’s a game changer. That’s why we created Strata.

O’Reilly’s conferences forge new ties between industry leaders, raise awareness of technology issues we think are interesting and important, and crystallize the critical issues around emerging technologies. Understanding these emerging technologies—and how they will transform the way we do business—has never been more crucial. If you want to understand the challenges and opportunities wrought by big data, you’ll want to attend Strata.

Spark!

More than 80 people showed up at our last meetup focused on Spark. Because there are more and more signs that Spark will become the successor to Hadoop MapReduce we invited some people who are already using Spark in production.

Andy gave an introduction to functional progamming and Scala in just 45 minutes, which is  definitely not enough for passing all details. His slides can be found here

Excellent meetup. The Scala introduction was so quick that it blew my mind but gave me enough information to follow the rest

(Eric Darchis)

We had Toni Verbeiren who gave an introduction to Spark and demonstrated Spark from the command line. Follow the links to his slides and visualization code.

Very interesting mix of Scala, Spark and Use Case

(Peter Vandenabeele)

Gerard Maas showed us how Spark is used in production at Virdata.com. With a cool demo of their platform in the end. His slides are availabele here: Spark-at-Virdata

It was Sparkling! (Radek O)

I am always amazed by the quality of the BigData.be and ScalaBe presentations. Big up to all of you ! (Frederic)

The presentations were recorded by Parleys.com and to be published in a “bigdata.be” channel. We’ll let you know when they become available over there.

Thanks to Ordina for the location and for providing food and drinks.

See you next time, we are always looking for venues and presenters.

 

Data science and R meetup

As announced on our meetup page, we had @JaredLander over from New York for a project at BigBoards.  So we rose to the occasion and had him talk about Backends for Big Data in R. This comment from the meetup page, says it all!

Nice fast-paced presentation with style from Jared
Marcel Dumont

To complement this “gentle introduction to R” ;-), our second presentation was given by DataCamp. They introduced us to what DataCamp stands for, how their platform is architected and how teachers write courses (completely in R!). Again, your feedback says more than a thousand words …

This presentation impressed me the most. How a couple of students from the KULeuven can start-up their own company and be successful in filling the gap of training services in R
Jean-Jacques DE CLERCQ

(Hey guys @datacamp, if you are reading this from over there in LA at the use R! conference, can we share your slides here?)

What did you learn? What do you think about this meetup?

23RD MEETUP – DATA SCIENCE/ELASTICSEARCH/ELECTIONS

On May 27th we had our 23rd meetup in Ghent kindly hosted by iMinds. We had a healthy mix of technical and business items.

Following presentations were given:

  1. Introduction to the Brussels Data Science Meetup by Philippe Van Impe (30 min)

Philippe gave an introduction to this Meetup, the projects they are working (with possible link to big data) and their link to the non-profit datakind organization (“data for good”)

His presentation can be found here.

2.  Introduction to ElasticSearch by Eric Rodriguez (60 min)

In just less than one hour Eric gave an introduction to all features of ElasticSearch.  During the presentation Toon Vanaght showed how ES is used at data.be.

His presentation can be found here and you are also invited to checkout the Belgian ElasticSearch Meetup group.

3.  Election Bingo by Stijn Beauprez (30 min)

The vk14-bingo.be application can be used for making up your mind for voting on sunday, it gives you insight into the topics our political parties are talking about.

Stijn demo-ed the application and explained which technologies were used to implement this application.

His presentation can be found here.

4.  De Verkiezingen by Philippe Kerremans (10 min)

Deverkiezingen.be website is another application of using social media to get insight into the mother of all elections. Philippe explained how they used ElasticSearch and D3.js among other technologies.

His presentation can be downloaded here: BigDataMeetupDeverkiezingen

Many thanks to iMinds for hosting the location and DataCrunchers for providing drinks.

In the mean time we now have more than 700 members!

22nd meetup – Cloudera on HBase and Scoop

It has been quite a while since we actually posted something on our website. Wow!!! Time really flies.

On April 4th 2014, we had our 22nd meetup already. Klaas Bosteels was able to attract 2 prominent speakers from Cloudera who were touring Europe and presenting at the 2014 Hadoop Summit in Amsterdam.

  1. Jon Hsieh (Software Engineer @ Cloudera and HBase Committer/PMC Member) talked about Apache HBase: Now and the futureApache HBase is a distributed non-relational database that provides low-latency random read write access to massive quantities of data. This talk will be broken up into two parts. First I’ll talk about how in the past few years, HBase has been deployed in production at companies like Facebook, Pinterest, Groupon, and eBay and about the vibrant community of contributors from around the world include folks at Cloudera, Salesforce.com, Intel, HortonWorks, Yahoo!, and XiaoMi. Second I’ll talk about the features in the newest release 0.96.x and in the upcoming 0.98.x release.
  2. Kate Ting (Technical Account Manager @ Cloudera and Sqoop Committer/PMC Member, co-author of the Apache Sqoop Cookbook) presented Apache Sqoop: Unlock HadoopUnlocking data stored in an organization’s RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we’ll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.

And of course, Accenture was so kind to host us at their gorgeous venue in Brussels with that spectacular view! They presented their Big Data Challenge where 4 teams of about 5 consultants deep dived into big data and data science to solve some practical cases. You can get in touch with their consultants to know more.

At least the elaborated example on real-time predicting the delays of  public transport was really interesting. It made my hands itch to start a new BigData.be project!

See you next time!