Quantcast
Channel: Mali Developer Center » Jakub Lamik
Viewing all articles
Browse latest Browse all 5

ARM Mali-T628 or how performance sometimes outstrips promises

$
0
0

In the fast-paced technology world we are used to hearing about improvements from one product generation to the next. In the past, we have looked in great detail at the various different metrics used to compare GPU performance for graphics and compute use cases. This time we want to celebrate the recent release of the Samsung Galaxy Note 3 and the Samsung Galaxy Note 10.1 (2014) based on the Samsung Exynos 5 Octa (5420) platform with ARM® Mali™-T628 and the fact that “performance takes a huge step forward” when compared to previous devices on the market. But we also want to take this opportunity to highlight energy-efficiency optimisations and double check if we have delivered on our promise of a 50% improvement. So in this blog we will look more closely into the best practices for benchmarking the performance in battery and area constrained devices and when comparing improvements in the energy efficiency.

Attached Image

Let’s start with the three golden rules of benchmarking GPUs for energy efficiency.

Screen resolution
Let’s imagine two devices with two different form factors. One of them has a 720p screen while the other is equipped with an ultra high definition 2.5K screen. In each frame the latter device has to process four times more pixels then the former one (see graph below). This means that in order to deliver the same performance it has to provide four times more throughput and potentially consume a proportionally higher amount of energy. That explains why most of the industry standard benchmarks tend to use the off-screen buffers with fixed resolution (for instance 1080p in the case of GLBenchmark) and are therefore able to provide an apples-to-apples comparison for devices of different form factors.

Attached Image

Performance
To render a single frame of given content, a GPU has to process a specific amount of data and consume a given amount of energy. In the typical use case it will be required to deliver 60 frames per second for any content visible on the screen. Even if the GPU is capable of running faster than 60 fps the frame rate will be capped by the screen refresh rate of 60Hz. As we pointed out earlier, typical graphics benchmarks will often use off-screen buffers to compare performance at the same screen resolution – this enables tests to be running at a frame rate beyond 60fps and allows devices to be compared at their top-end performance.

Attached Image

Use Case
Modern mobile devices enable different use cases with diverse graphics requirements, particularly when it comes to the complexity of the content being processed. Obviously a GPU has to do much less to process a single frame of user interface or a casual game then it would when running a high-end game or a graphics benchmark designed to stress test the graphics system. Industry standard 3D graphics benchmarks provide a good indication of what we could expect from the AAA class content. However, we also have to look into the test cases that match more causal use cases i.e. playing Fruit Ninja or scrolling through the Android™ UI. In the past we covered why metrics such as triangles per second or pixels per second don’t necessarily map into the real-life balanced use case, and why it is always important to actually run the application that is representative for the use cases we want to characterise.

Attached Image

Energy efficiency
Taking into account the above considerations on performance, screen resolution and use case it’s much easier to realise why average power on its own is not a useful metric when comparing how a given GPU performs within a given power budget. As it stands, this metric ignores performance and resolution and without any further context it says nothing about the efficiency of the GPU.

If we had to construct a metric that takes those factors into account we would have to provide an Average Power for a given Performance at a given Screen Resolution running a given Use case. Ignoring the excessive use of upper case in the above sentence, clearly we would need to simplify it a little bit – and we have two choices:

 

 

  • FPS per Watt – in other words average performance achieved within a given average power budget when running a given use case at a given screen resolution
    • An example would be a GPU that delivers 50 Frames per second within 1 watt of average power when running GLBenchmark 2.7 T-Rex 1080p Off-screen
  • mJ per Frame – so in other words energy used per frame for a given use case at a given screen resolution
    • An example would be a GPU that uses on average 20 mJ per frame when running GLBenchmark 2.7 T-Rex 1080p Off-screen

Leaving behind the academic debate on the advantages of joules per task or even joules per pixel over performance per watt, in both of the cases above we are talking about exactly the same GPU and we have just chosen to present it in two different ways.

Second generation of Mali-T600 series delivers 50% energy efficiency improvement… and then some

Up to now this article was far from the promised celebration of the market leading Samsung devices with Mali-T628 and it’s time to change that… So how did we do with the second generation of the Mali-T600 series and how do we compare to other devices on the market? For that we will compare the performance and energy efficiency between a 2012 tablet with the Mali-T604 and a 2013 one with the Mali-T628 MP6.

Attached Image

As you can see we not only delivered the 50% energy efficiency improvement, but in reality we achieved more than a 100%. Obviously the semiconductor fabrication used to create a System on a Chip is constantly improving and new process nodes introduce higher clock speeds and lower power consumption. So we have to take into account manufacturing processes improvements as well as benefits of using ARM Artisan™ Physical IP that enables efficient implementation of such complex SoC designs. But even with that we can safely assume that we have delivered our promised improvement in the Mali-T620 series.

Also it is worth bearing in mind that GPU energy efficiency is only half of the story, ARM big.LITTLE™ multi-processing technology and the ARM Cortex™-A series of processors deliver high performance and efficiency across the entire system. Additionally, memory bandwidth also contributes to the entire system power. Mali GPUs with Job Manager, Hierarchical Tiler, Transaction Elimination, Adaptive Scalable Texture Compression and other upcoming features enable additional multiple end-to-end savings that result in ARM-based systems with longer battery life and lower thermal budget.

So to summarise, Mali-T628 GPU has delivered even more than we promised, and in the next few blogs we are going to explain how ARM technology leadership and Mali GPUs further reduce system power.


Viewing all articles
Browse latest Browse all 5

Trending Articles