More than 80 people showed up at our last meetup focused on Spark. Because there are more and more signs that Spark will become the successor to Hadoop MapReduce we invited some people who are already using Spark in production.
Andy gave an introduction to functional progamming and Scala in just 45 minutes, which is definitely not enough for passing all details. His slides can be found here
Excellent meetup. The Scala introduction was so quick that it blew my mind but gave me enough information to follow the rest
Nice fast-paced presentation with style from Jared
To complement this “gentle introduction to R” ;-), our second presentation was given by DataCamp. They introduced us to what DataCamp stands for, how their platform is architected and how teachers write courses (completely in R!). Again, your feedback says more than a thousand words …
This presentation impressed me the most. How a couple of students from the KULeuven can start-up their own company and be successful in filling the gap of training services in R
—Jean-Jacques DE CLERCQ
(Hey guys @datacamp, if you are reading this from over there in LA at the use R! conference, can we share your slides here?)
What did you learn? What do you think about this meetup?
Deverkiezingen.be website is another application of using social media to get insight into the mother of all elections. Philippe explained how they used ElasticSearch and D3.js among other technologies.
It has been quite a while since we actually posted something on our website. Wow!!! Time really flies.
On April 4th 2014, we had our 22nd meetup already. Klaas Bosteels was able to attract 2 prominent speakers from Cloudera who were touring Europe and presenting at the 2014 Hadoop Summit in Amsterdam.
Jon Hsieh (Software Engineer @ Cloudera and HBase Committer/PMC Member) talked aboutApache HBase: Now and the future: Apache HBase is a distributed non-relational database that provides low-latency random read write access to massive quantities of data. This talk will be broken up into two parts. First I’ll talk about how in the past few years, HBase has been deployed in production at companies like Facebook, Pinterest, Groupon, and eBay and about the vibrant community of contributors from around the world include folks at Cloudera, Salesforce.com, Intel, HortonWorks, Yahoo!, and XiaoMi. Second I’ll talk about the features in the newest release 0.96.x and in the upcoming 0.98.x release.
Kate Ting (Technical Account Manager @ Cloudera and Sqoop Committer/PMC Member, co-author of the Apache Sqoop Cookbook) presented Apache Sqoop: Unlock Hadoop: Unlocking data stored in an organization’s RDBMS and transferring it to Apache Hadoop is a major concern in the big data industry. Apache Sqoop enables users with information stored in existing SQL tables to use new analytic tools like Apache HBase and Apache Hive. This talk will go over how to deploy and apply Sqoop in your environment as well as transferring data from MySQL, Oracle, PostgreSQL, SQL Server, Netezza, Teradata, and other relational systems. In addition, we’ll show you how to keep table data and Hadoop in sync by importing data incrementally as well as how to customize transferred data by calling various database functions.
And of course, Accenture was so kind to host us at their gorgeous venue in Brussels with that spectacular view! They presented their Big Data Challenge where 4 teams of about 5 consultants deep dived into big data and data science to solve some practical cases. You can get in touch with their consultants to know more.
At least the elaborated example on real-time predicting the delays of public transport was really interesting. It made my hands itch to start a new BigData.be project!
It’s a bit late notice unfortunately, but we’ll be doing another meetup on July 16th in Ghent, featuring a very promising talk by renowned data geek Jimmy Lin about Twitter’s big data mining infrastructure. Space is limited, so you should head to our corresponding meetup page straight away to reserve your spot.
The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. This talk will discuss the evolution of the Twitter infrastructure and the development of capabilities for data mining on “big data”. We’ll share experiences as a case study, but make recommendations for best practices and point out opportunities for future work.
About the speaker
Jimmy Lin is an associate professor in the iSchool at the University of Maryland, with appointments in the Institute for Advanced Computer Studies (UMIACS) and the Department of Computer Science. He works on “big data”, with a particular focus on large-scale distributed algorithms for text processing. His research lies at the intersection of natural language processing (NLP) and information retrieval (IR). Recently, Jimmy spent an extended sabbatical (from 2010 to 2012) at Twitter working on large-scale data analytics. Previously, he has also done work for Cloudera, the enterprise Hadoop company.
The friendly folks of NGDATA in Gent will host our 9th meetup. Thanks for that already!
Next to a location, we are always looking for interesting things to discuss during the meetup. Have you read something interesting in the bigdata/nosql space lately? Are you implementing something amazing right now? Do you have a problem, that you want to discuss? Let us know!