Graphics processing units or GPUs are dedicated highly parallel hardware accelerators that were originally design to accelerate the creation of images. More recently, folks have been looking at GPUs to accelerate other workloads like Database analytics and transaction processing (OLTP). Although GPUs have little or no use for OLTP style workloads, they have been shown to accelerate analytics.
So, what kind of benefits can you expect from running the Oracle Database on a GPU and where are you likely to see these benefits?
The Oracle Database has a long history of adopting new technologies as they become available and allowing customers to take advantage of these technologies transparently in their existing applications. Oracle is continuing this tradition by taking advantage of the latest hardware and software technologies to provide dramatically faster analytics performance.
Oracle has already released Oracle Database In-Memory which uses columnar in-memory formats to greatly accelerate analytics. The columnar in-memory algorithms make extensive use of SIMD Vector instructions that are already present in CPUs. SIMD Vector instructions accelerate analytics by processing many data elements in a single instruction. SIMD Vector instructions benefit from having full access to the very large caches and memory bandwidth that exist in current CPU sockets. An advantage of SIMD Vector instructions is that they are present in all existing CPUs and add no further cost, complexity, or power usage is required on top of the existing hardware.
Oracle continues to rapidly add new SIMD Vector algorithms to the database to take further advantage of these specialized instructions. SIMD Vector processing has already delivered very large analytics performance gains in the Oracle Database, and customers should expect additional performance gains from new and improved uses of these instructions in future database releases. What great about this approach is that these performance gains will be transparent to applications and require no additional hardware or effort from the customer other than installing the software.
Further, Oracle has been actively working with Intel and other chip vendors for many years to add additional SIMD vector instructions to CPUs for the specific purpose of accelerating Oracle Database algorithms. Some of these instructions are now becoming available, and more instructions will become available as new CPU chips are released in the next few years.
GPUs offer the potential to further accelerate analytic processing through two mechanisms:
- Adding more parallel processing
- Using higher bandwidth, but much smaller specialized memory called High Bandwidth Memory (HBM)
Oracle is actively working with the major GPU vendors to implement database algorithms that use these devices. But current generation GPUs have several disadvantages:
- They are heavily oriented towards floating point and other numeric processing. Therefore, the large majority of processing power available in these devices is not useful for accelerating database algorithms.
- These devices sit on the PCI bus, and don’t have direct access to the server’s DRAM memory. Instead GPUs have their own local high bandwidth memory, but the size of this local memory is one to two orders of magnitude smaller than the server memory. All data that a GPU processes must be moved back and forth across the PCI bus from the main CPUs.
It is important to learn the basic architectural benefits and tradeoffs of GPUs in order to understand where they provide the most value. The huge number of parallel computation engines provided by these devices excel at accelerating tasks that require large numbers of computations on small amounts of data. GPUs are extremely effective for Blockchain applications because these require billions of computations on a few megabytes of data. GPUs are great for deep learning since these perform repeated computational loops on megabytes to gigabytes of data. GPUs are great for graphics because three-dimensional imaging requires millions of computations on every image. The pattern here is the same – lots of computation on modest amounts of data.
Databases analytics has a completely different pattern of data usage.
Analytics typically perform a small number of simple calculations on large amounts of data, often hundreds of gigabytes to petabytes of data. For example, a typical analytic query will apply a simple predicate (e.g. filter sales by date or region) and then perform a simple aggregation function (e.g. sum or average).
Note that the analytics usage pattern is the exact opposite of the sweet spot for GPUs described above. Because the data being processed is much larger than can fit in the local GPU memory, data must be moved back and forth across the PCI bus. This limits the total throughput to the PCI bus bandwidth which is dramatically lower than the local memory bandwidth. This doesn’t mean that GPUs don’t provide any benefits for analytics, but users should not expect the dramatic benefits seen in other applications. It is just not architecturally possible.
Oracle, and other vendors, have found that some database analytics algorithms can in fact run faster on GPUs than using conventional processing methods. However, care should be taken when reading performance comparisons because the analytic landscape is rapidly changing.
As mentioned before, databases are now increasingly taking advantage of SIMD Vector instructions. The comparisons that are often published showing huge advantages for GPUs usually contrast performance using traditional database algorithms vs new and highly optimized GPU algorithms. Further more, these comparisons often use easily available but un-optimized and un-parallelized open-source databases that are orders of magnitude slower than commercial databases for analytics. Oracle’s internal benchmarking shows that comparing GPUs to current vector instruction optimized algorithms greatly narrows the performance advantage of GPUs. New vector optimized algorithms and parallelization that Oracle will be releasing will further narrow the gap, and in most cases, we find that a standard two-socket server will deliver similar performance using these new algorithms to a server with eight GPU cards.
In addition to big changes in software algorithms that are coming in the analytics area, there are big changes coming in hardware. PCI buses will get faster, and future GPUs will reduce their PCI bus communication disadvantages by adding direct high bandwidth communication with the main CPUs. On the other hand, future CPUs may add support for High Bandwidth Memory eliminating one of the main advantages of GPUs.
In summary, Oracle is actively improving its analytic algorithms by further leveraging SIMD Vector instructions and improving parallelism. The Oracle database is already dramatically faster for analytics than it was a few years ago and will get much faster in the coming releases. Oracle is working with both conventional CPU vendors and GPU vendors to add new hardware capabilities that specifically optimize database processing. Current GPUs can be shown to run some analytic algorithms faster but achieving these advantages in a non-benchmark environment is challenging because these algorithms only work for a subset of analytic functions, and data needs to be moved back and forth across the PCI bus. Oracle is also actively working on adapting its database algorithms to take transparent advantage of GPUs and will release these algorithms if we find that the performance gains are sufficient and sustainable.