- by Tiffiny Rossi
Last year, we reported that an autonomous car will generate over 300 TB per year. That figure was based on the driving habits of the average American. But there’s a vast stream of data currently being generated by autonomous test vehicles every day all around the world. Take for example, Waymo, the self-driving technology development company. They recently reported that their test vehicles have logged over 8 million miles (nearly 13 million kilometers) on public roads, all the while collecting data using onboard sensors.
Calculating test vehicle data-storage needs
Pinning down exactly how many hours per day these test vehicles drive is difficult, as Waymo has not disclosed its fleet size. Forbes reporter David Silver estimated that 200 vehicles driving 8 hours per day at 15 mph would match Waymo’s reported mileage rate. Plus, as we reported in our earlier post, the sensors in an autonomous vehicle record between 1.4 terabytes (TB) to around 19 TB per hour. Combined with the Waymo estimates, we can calculate the amount of data generated by one autonomous test car ranges between 11 TB and 152 TB per day! Multiply that by an estimated 200 vehicles in Waymo’s fleet—that means Waymo needs to store anywhere from 2.2 petabytes (PB) to 30.4 PB of sensor data per day for their entire fleet.
And this is just the hypothetical amount of data generated and stored by one company. All the major car companies and suppliers across the globe also test drive vehicles with varying levels of autonomy, each collecting data of its own to store, process, and analyze.
Storing autonomous and ADAS data brings new challenges
As autonomous vehicles take advantage of various technologies used in advanced driver assistance systems (or ADAS), testing in this area is broadly categorized under ADAS research and development.
This R&D work brings new challenges in storing the massive volumes of data produced by test cars. These vehicles generally have a huge PC platform in the trunk, loaded with tens or hundreds of terabytes of storage provided by flash solid-state drives (SSD). The SSD arrays are swapped out for new ones as they fill, while data from the used SSDs is transferred to a server rack.
On top of the sheer volumes of data, the dispersed nature of the data collection and storage adds another layer of complexity. Test engineers pool data from tens or hundreds of these roving vehicles distributed across many different locations. Under these conditions, it would be much harder to guarantee the fleet’s data is always available for processing and analysis using traditional storage methods. This would require duplicating the data in the servers at each location—which would get extremely expensive.
Not to mention, there’s a lot more to ADAS and autonomous R&D than simply driving test vehicles on actual roads. A lot of simulated driving also goes on in the lab, using algorithms and input data to test and teach vehicles. Methods like software- and hardware-in-loop testing (SIL and HIL) allow test engineers to feed collected sensor data to ADAS software or hardware to see it how the system behaves.
Traditional data storage methods simply can’t meet all these needs in a cost-effective manner.
A distributed file system up to the task
As leading innovators in file systems development, we’re ready to tackle these massive data storage challenges for our customers. A distributed file system is a convenient way to share data over these geographically separated vehicles in a controlled, cost-efficient manner.
We now offer MooseFS by Tuxera, our fault-tolerant distributed parallel file system designed for extremely demanding data workloads. Our hardware- and OS-agnostic solution offers exceptional performance and scalability on commodity hardware. MooseFS by Tuxera scales linearly together with your data all the way up to 16 exabytes—or roughly 16,000 petabytes—to store more than 2 billion files on a single cluster.
What’s more, MooseFS by Tuxera always runs as a single file system volume, no matter the cluster size or geographical distribution. Plus, our erasure coding technology ensures redundancy while using less raw space compared to ordinary data duplication approaches. And to tackle rapid transfers of massive amounts of data, MooseFS also features its own lightning-fast communication protocol.
Car makers, Tier-1 suppliers, and autonomous development companies—find out how we can help you store, manage, and analyze more ADAS and autonomous data.