So what is HBase? Yes, it’s the Bigtable-like structured storage for Hadoop HDFS, but how exactly does it work? What is the architecture? When is a good time to use it and when is not? This post will help inform those questions. More…
Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!
Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Videos…
What’s Hot in Data Science? Well, there has been a lot of talk recently about DEEP-LEARNING, a subset of Machine Learning, which allows machines to classify what they perceive.
Adam Gibson (Data Scientist and Co-Founder, Blix.io) presents his open-source, distributed deep-learning framework, Deeplearning4j. He demos sentiment analysis and facial recognition tools. If you are using or learning Machine Learning then you should watch this video. Watch Video
Hi All, here are some of the top roles in Data Science / Big Data / Analytics that we are working to fill on behalf of our clients.
Currently, we have about 90+ open roles that are all full-time perm, consultant and/or “right-to-hire”. If you are in the market please have a look. Read More
Great (free) Machine Learning course for beginners by Caltech University. Introduction to; supervised, unsupervised, and reinforcement learning. Components of the learning problem. Lectures 1 of 18 of Caltech’s Machine Learning Course – CS 156 by Professor Yaser Abu-Mostafa. Watch Video
(Reposted due to popular demand) Another great video from Josh Wills. Josh is Sr. Director of Data Science at Cloudera and has a gift for making fairly complicated technology explanations very digestible to the novice and intermediary techie.
What I most love about this video is how Josh explains -very clearly – the issue of translating analytics Machine Learning on a large set of data records (many individuals) and making it work well in a “real life” production environment on a single individual (think eCommerce). Watch Video
A Masters in Data Science “Will Hunting Style”. More and more people are learning on-line via the flood of excellent “open source” resources of classes, ebooks, software, etc. Clare Corthell has created a website to allow anybody to take virtually the same curriculum offered for a Masters in Data Science for Free.
Will it be an official Masters? No. But an official Masters is not what is needed, it’s knowledge and experience working with the tools and techniques necessary to actually do Data Science. For some, this free curriculum will allow business-line leaders, Analysts and Programmers from other fields to fill in the education gaps and get better at their job, as well as, one step closer to being an actual Data Scientist. Read More
Machine learning is a subfield of computer science and artificial intelligence that deals with the construction and study of systems that can learn from data, rather than follow only explicitly programmed instructions (wikipedia).
If you are thinking of doing or becoming a Data Scientist or Advanced Analytics professional, you will absolutely need to master Machine Learning. These 100 Most Popular Talks on Machine Learning topics are a great resource to learn. Review List
The Data Science Report has created a new section “PhD PREDICTION PROJECTS!” In this section we are going to profile interesting predictive analytics projects by “up and coming” PhD and Masters students/recent grads.
Our first project looks at Eric Chalmer’s computer model that uses Machine Learning to predict the outcome and how much to wager on MMA/UFC fights. While initially a hobby, his algo’s have performed remarkably well (20% ROI) during the first 8 months and we’re going to track his predictions here. Read More
This guide by Robert Schneider, who also wrote Hadoop for Dummies, created this guide give you everything you need to know about choosing the right Hadoop distribution For Production Read More
Watch Webinar on Thursday Sept 4th at 12:30pm EST – 45 mins. “How is the Internet of Things Driving the Adoption of Apache Spark™?”
WHY? Hear from Big Data experts on what is available in the industry today, what’s anticipated, and why they think SPARK is the next big thing for Big Data. What is SPARK? Why is it relevant? How does SPARK play into a more intelligent use of data? Read More