I attended this year’s Strata + Hadoop World Conference in NYC at the Javit’s Center late last week. There were “boat loads” of speakers, tutorials, and vendors pitching their latest, greatest software, solutions, and hardware to attack the “big data” opportunity.
Here are some notes from the conference and other articles, jobs openings of note from last week. Read More
Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!
Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Videos…
(Reposted due to popular demand) Another great video from Josh Wills. Josh is Sr. Director of Data Science at Cloudera and has a gift for making fairly complicated technology explanations very digestible to the novice and intermediary techie.
What I most love about this video is how Josh explains -very clearly – the issue of translating analytics Machine Learning on a large set of data records (many individuals) and making it work well in a “real life” production environment on a single individual (think eCommerce). Watch Video
Hakka Labs has a very compelling video of Dr Chris Wiggin’s, Chief Data Scientist at the NYTimes and Columbia University Academic. He will talk about using machine learning and large data in both academia and in business.
He shares some ways re-framing domain questions as machine learning tasks has opened up new avenues for understanding both in academic research and in real-world applications. Read More
In this “Data Skeptics Meetup” Video, Dr Jerry Smith, Chief Data Scientist at Capgemini Advanced Digital Intelligence and Data Science & Analytics, explores how data science is impacted by one of the most complex data sets of all times, the deep web.
In this talk you will learn about what the Deep Web is, Open Source Intelligence, and how the “bad guys” are using it orchestrate violence. Read More
Feeling Sick today? Heard something was “going around”?
We’ll these 2 companies are making it easy for all of us to instantly see on a local map what’s “going around” use from an illness perspective (viruses, allergies, etc.). This can help us avoid or diagnose the illness quicker and get on with our daily lives. Read More
So what is HBase? Yes, it’s the Bigtable-like structured storage for Hadoop HDFS, but how exactly does it work? What is the architecture? When is a good time to use it and when is not? This post will help inform those questions. More…