.. _performance:

How fast can we write HDF files?
================================

There are many factors that affect the speed we can write HDF files. This article
discusses how this library addresses them and what the maximum data rate of a PandA is.

Factors to consider
-------------------

.. list-table::
    :widths: 10 50

    * - Trigger frequency
      - Each trigger will send all the captured fields, so the higher the trigger
        frequency the more data is sent
    * - Fields captured
      - Both the number of fields captured and their capture types affect the data
        sent on each trigger
    * - Sample format and processing
      - Server side format and processing of each sample affects the data volume.
        Framed format is selected by this library, with raw or scaled data
        processing available.
    * - Network speed
      - 1 Gigabit ethernet should be used to maximise throughput
    * - File system speed
      - Some local disks and NFS mounts may not be fast enough to sustain
        maximum data rate.
    * - CPU load on the PandA
      - Excessive CPU load on the PandA, generated by extra TCP server clients
        or panda-webcontrol will reduce throughput
    * - Flush rate
      - Flushing data to disk to often will slow write speed

Strategies to help
------------------

There are a number of strategies that help increase performance. These can be
combined to give the greatest benefit

Average the data
~~~~~~~~~~~~~~~~

Selecting the ``Mean`` capture mode will activate on-FPGA averaging of the
captured value. ``Min`` and ``Max`` can also be captured at the same time.
Capturing these rather than ``Value`` may allow you to lower the trigger
frequency while still providing enough information for data analysis

Scale the data on the client
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`AsyncioClient.data` and `BlockingClient.data` accept a ``scaled`` argument.
Setting this to False will transfer the raw unscaled data, allowing for up to
50% more data to be sent depending on the datatype of the field. You can
use the `StartData.fields` information to scale the data on the client.
The `write_hdf_files` function uses this approach.

Remove the panda-webcontrol package
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The measures above should get you to about 50MBytes/s, but if more clients
connect to the web GUI then this will drop. To increase the data rate to
60MBytes/s and improve stability you may want to remove the panda-webcontrol
zpkg.

Flush about 1Hz
~~~~~~~~~~~~~~~

`AsyncioClient.data` accepts a ``flush_period`` argument. If given, it will
squash intermediate data frames together until this period expires, and only
then produce them. This means the numpy data blocks are larger and can be more
efficiently written to disk then flushed. The `write_hdf_files` function uses
this approach.


Performance Achieved
--------------------

Tests were run with the following conditions:

- 8-core Intel i7 machine as client
- Version 2.1 of panda-server installed on PandA
- PandA and client machine connected to same Gigabit ethernet switch
- 60 byte sample payload
- Using the commandline pandablocks hdf utility to write data to an SSD

When panda-webcontrol is installed with a single browser connected, the following results
were achieved:

- 50MBytes/s throughput
- PandA CPU usage about 75% (of both cores)
- local client CPU usage about 55% (of a single core)

When panda-webcontrol was not installed, the following results were achieved:

- 60MBytes/s throughput
- PandA CPU usage about 65% (of both cores)
- local client CPU usage about 60% (of a single core)

Increasing above these throughputs failed most scans with `DATA_OVERRUN`.

Data overruns
-------------

If there is a `DATA_OVERRUN`, the server will stop sending data. The most recently
received `FrameData` from either `AsyncioClient.data` or `BlockingClient.data` may
be corrupt. This is the case if the ``scaled`` argument is set to False. The mechanism
the server uses to send raw unscaled data is only able to detect the corrupt frame after
it has already been sent. Conversely, the mechanism used to send scaled data aborts prior
to sending a corrupt frame.

The `write_hdf_files` function uses ``scaled=False``, so your HDF file may include some
corrupt data in the event of an overrun.