klaas.bosteels, Author at bigdata.be

Author: klaas.bosteels

Klaas is a data analysis and machine learning geek that enjoys the challenges of playing with large amounts of data. He has contributed code to several open source projects related to data processing, the largest one being Apache Hadoop, and is the original author of Dumbo, a Python API for writing and running Map/Reduce applications. He was also the main organizer of a series of Hadoop User Group UK meetups, with speakers from various well-known companies such as Facebook, StumbleUpon, Playfish and Cloudera. Over the last couple of years he has given a few Hadoop-related talks himself as well – a recent one being a tech talk at the LinkedIn headquarters about the usage of Hadoop at Last.fm, the London-based web company where he took on the role of Data and Scalability Engineer after finishing his Ph.D. at Ghent University in December 2009. In January 2011, he moved back to Ghent to join Massive//Media as a Data Scientist.

It’s a bit late notice unfortunately, but we’ll be doing another meetup on July 16th in Ghent, featuring a very promising talk by renowned data geek Jimmy Lin about Twitter’s big data mining infrastructure. Space is limited, so you should head to our corresponding meetup page straight away to reserve your spot.

Talk abstract

The analytics platform at Twitter has experienced tremendous growth over the past few years in terms of size, complexity, number of users, and variety of use cases. This talk will discuss the evolution of the Twitter infrastructure and the development of capabilities for data mining on “big data”. We’ll share experiences as a case study, but make recommendations for best practices and point out opportunities for future work.

About the speaker

Jimmy Lin is an associate professor in the iSchool at the University of Maryland, with appointments in the Institute for Advanced Computer Studies (UMIACS) and the Department of Computer Science. He works on “big data”, with a particular focus on large-scale distributed algorithms for text processing. His research lies at the intersection of natural language processing (NLP) and information retrieval (IR). Recently, Jimmy spent an extended sabbatical (from 2010 to 2012) at Twitter working on large-scale data analytics. Previously, he has also done work for Cloudera, the enterprise Hadoop company.