The scale of HDFS continues to soar upward. For large social media and cloud providers, the size of Hadoop clusters is such that it is hard to test out this basic component of classic Hadoop at scale before roll outs. That is another one of those niggling issues that slows Hadoop adoption. At LinkedIn, the... The post LinkedIn open sources Dynamometer load and stress tester for HDFS appeared first on Talking Data Podcast » Episodes.
The scale of HDFS continues to soar upward. For large social media and cloud providers, the size of Hadoop clusters is such that it is hard to test out this basic component of classic Hadoop at scale before roll outs. That is another one of those niggling issues that slows Hadoop adoption. At LinkedIn, the challenges of successfully making even small configuration changes across broad arrays of HDFS led a team to create Dynamometer. This load and stress test suite uses actual NameNodes, combined with simulated DataNodes to prove out changes to settings across large the Hadoop data farms that help LinkedIn link folks together. Today, many issues are impossible to test out without running a cluster that is similar in size to what were is used in production, according to Carl Steinbach, senior staff software engineer at LinkedIn. He said that one of the goals of the project is to have positive “upstream effect” on the work of Apache community members’ releases, to, in effect, make testing at scale foremost in the effort. Steinbach’s colleague, engineer Eric Krogen, adds that HDFS developers are looking toward the day when they can find bugs before new versions are committed, rather than six -months later when new software comes to cover very large scale clusters. In this edition of the Talking Data Podcast, the crew speaks with analyst Mike Matchett, of the SmallWorldBigData consultancy, to get a better view into Dynamometer. Testing tools of this kind will only gain in importance going forward, he suggested. “Whatever big data we have today, it’s going to be bigger tomorrow,” said Matchett. Also on tap in this podcast is a discussion of TensorFlow from Google. The company has supported work on this machine learning framework on CPUs, GPUs, and, most recently TPUs – these being Tensor Processing Units build especially to accomplish highly iterative neural network computations. Word is that Google is preparing to open up TPU processing to outsiders that use its Google Cloud Platform. – Jack Vaughan  The post LinkedIn open sources Dynamometer load and stress tester for HDFS appeared first on Talking Data Podcast » Episodes. The post LinkedIn open sources Dynamometer load and stress tester for HDFS appeared first on Talking Data Podcast » Episodes.
The scale of HDFS continues to soar upward. For large social media and cloud providers, the size of Hadoop clusters is such that it is hard to test out this basic component of classic Hadoop at scale before roll outs. That is another one of those niggling issues that slows Hadoop adoption.
At LinkedIn, the challenges of successfully making even small configuration changes across broad arrays of HDFS led a team to create Dynamometer. This load and stress test suite uses actual NameNodes, combined with simulated DataNodes to prove out changes to settings across large the Hadoop data farms that help LinkedIn link folks together.
Today, many issues are impossible to test out without running a cluster that is similar in size to what were is used in production, according to Carl Steinbach, senior staff software engineer at LinkedIn. He said that one of the goals of the project is to have positive “upstream effect” on the work of Apache community members’ releases, to, in effect, make testing at scale foremost in the effort.
Steinbach’s colleague, engineer Eric Krogen, adds that HDFS developers are looking toward the day when they can find bugs before new versions are committed, rather than six -months later when new software comes to cover very large scale clusters.
In this edition of the Talking Data Podcast, the crew speaks with analyst Mike Matchett, of the SmallWorldBigData consultancy, to get a better view into Dynamometer. Testing tools of this kind will only gain in importance going forward, he suggested.
“Whatever big data we have today, it’s going to be bigger tomorrow,” said Matchett.
Also on tap in this podcast is a discussion of TensorFlow from Google. The company has supported work on this machine learning framework on CPUs, GPUs, and, most recently TPUs – these being Tensor Processing Units build especially to accomplish highly iterative neural network computations. Word is that Google is preparing to open up TPU processing to outsiders that use its Google Cloud Platform. – Jack Vaughan
Â
The post LinkedIn open sources Dynamometer load and stress tester for HDFS appeared first on Talking Data Podcast » Episodes.