Load on disks

SSD vs HDD

The platform has 6-7 uses of the ROM (file system). Each defines a specific read rate, write rate, the ratio of these rates, the length of time the written data is retained, the frequency of accessing it, the criticality of the read access rate, and the criticality of the write access rate.

These uses are tied to specific (different) paths in the file system. More precisely they are bound to aliases, and the aliases in the configuration are bound to paths in the file system inside the container. When the system is installed on the server, a docker-volume is created for each of these partitions and attached to the container. This way, each one can be smartened to a different partition.

If all disks are SSDs with a high TBW value, that’s a good thing. Eliminates the problems of delays due to "overflow access queue" at the root. If you have a choice, then SSD.

Large capacity server SSDs are quite expensive (we are not talking about 1TB of course, but about larger size and higher TBW value), so HDD is a compromise for some cases.

  • On SSDs - all the working hardware where there is a lot of reading, a lot of writing, but small volumes or a rewrite cycle - logs, call recording, mixing, object database, script files, autoprovisioning, web server temporary files, and a hundred others ,

  • On HDD - everything that is accumulated. Archive of call records (if they are not sent to external S3 or NFS storage), archive of logs, possibly archive database.

SSD partition - for fast access, intensive overwriting (despite the fact that SSD lifecycle directly depends on the intensity of overwriting). SSD platform utilization statistics show that writes are an order of magnitude more intensive than reads. This should be taken into account when choosing disks. Each heavily loaded system server can write around 100 GB of logs per day, and a similar amount on the straps. With conversation recordings it is 1 TB (rough estimate). Thus, the lower segment of server SSD disks with TBW ~ 7 PB is at least 20 years away.

The database gradually overwrites its volume. The database can be placed on HDD, but under intensive load on the database - this very factor will inhibit further progress of the system performance. If you are ready to spread different domains, and even collections in a domain, across different database servers, HDD may be an acceptable option. Also, you can increase the speed if you gather several HDD disks into a long RAID-5 or its derivatives.

Examples

If the postgre database is used intensively, the disk HDD is 100% loaded - the next read request may be delayed. If the intensity of the system’s work with the database is such, then the following options are available: split the database into parts and move it to different servers under different domains/collections, or organize RAID-5, or move the database to the SSD.

At the moment of creation of the write dialog, the file is opened synchronously for writing. If the disk is loaded and gives a pause of more than 3 seconds, the call will fail. Writing is handled by servers with MG instances, and disks are loaded proportionally to the placement of instances across servers. If writing is performed on a partition belonging to SSD, the problem in practice is impossible with any number of calls - the computational limits of one server are reached earlier. And with HDD - vice versa.

The working directory of the system is used to host the mnesia object database. Therefore, the working directory is on SSD. The ability to write quickly without queuing is almost critical for data integrity. If mnesia shares the disk with other processes, it may interfere with them, although it will not interfere with the system itself - writing is always deferred, and reading is done at system startup entirely in RAM.

When a conversation is mixed, it is taken to a connected S3, NFS storage or distributed to its own servers if no storage is connected. There is no queue disturbing anyone, and access to listen to the conversation or view the log in the archive is not time-critical. HDD can cope quietly.

Storage location for recorded conversations

For HDD the size of 1 TB is a conditional minimum value so that for some time the lack of free space does not bother, and with statistics for several months you can orientate yourself: what real volume is needed for the prospect of storing logs in N weeks, and records of conversations in M years. Obviously you can do less, but you have to be guided by the planned load.

If you connect external storage to store archived conversations - this is more efficient, and better managed in larger systems. When using external storages, new storages can be created regularly and current files will be transferred there. If the previous storages are still available, the previous files will be read from them.

If you leave them default on the servers with the platform, then the storage volume is effectively a single volume (since it is limited to docker volumes; you can also develop here, but not prioritized). You can delete data older than N years. Or if the server is physical, merge several disks into one volume - this should be done before installation or when reinstalling the system on a particular server.

Partitions in the file system and their minimum sizes

The partitioning is the same on all servers, and in reality the load depends on the configuration.

The current installer provides the following root paths for partitions:

  • /usr/lib/era - component and application files. Including update archives. ~20 GB (SSD or HDD) OK.

  • /var/lib/era - working catalog: object database, mixing. ~80 GB OK.

  • /var/log/era - real-time logs. ~100 GB.

  • /var/lib/era_files/rectemp - real-time recordings - stored for 6 hours. ~20 GB SSD.

  • /var/lib/era_files/recpath - mixing results, storage is local by domain (unless external storage is configured) - NFS storage is supposed to be used. But in one or two server systems NFS is not used, and it’s own disk. It is under long term record storage in this case. ~20 GB if we take away to external storage. Or ~1 TV if we store archives here, without connection NFS.

  • /var/lib/era_files/local - local directory for storage. Temporary files - they are used extensively by the webserver. The default directory for attachments of all collections of the product data model (in particular, emails) is located there. HDD is enough, as long as it can handle the load. If the system is heavily loaded in all directions, temporary files should be placed on SSD (because the intensity is high), and collection attachments should be moved to other disks (because the volume is large).

  • /var/lib/era_files/syncroot - 20 GB SSD. In reality, it’s 1-2 GB. This is data from the automatic synchronizer between servers. There are script media files, public web pages, certificates, patches, autoprovisioning templates, static media files (for conferences, hold, parking, voicemail, voicing of numerals), attachment files to domain entities (custom applications of the product layer, microservices of the product layer), and project files of all kinds - if specially created storage here, or in scripts placement here is made.

  • /var/lib/era_files/logstore - * HDD - archive storage of logs. Its size is configurable in the configuration. 500 Gigabytes is conditionally for 1 week at average intensity.

  • /var/lib/era_files/siteshare - NFS. It is used in project settings as a single storage between servers of one site. For example in scripts - in one script placed, in another script used.

  • /var/lib/era_files/globalshare - NFS. Same, but in multisite systems between all sites.

  • /var/lib/era_files/a - additionally assigned partitions on different media, so that it is possible to transfer the load from one disk to another by changing the configuration on the fly. That is, they do not have their own data, but can take over instead of the other above mentioned ones.

  • /var/lib/era_files/b - same

  • /var/lib/era_files/c - same

Splitting into such small sections carries potential complications, precluding the possibility of reallocating space. For example, somewhere there are no roles with object databases at all, but a lot of logs are written (for example, signaling protocol traffic SIP).

These paths are inside the container. In the host, they are all locked to volums during installation, the default paths in the host are inside the /opt. The /usr/lib/era directory is not moved to the volum, and remains inside the container, which is somewhere in the system folders of the host (/var/lib most likely)

By the way, real-time logs can be written to xdd, but ONLY if xdd is dedicated to this task. If you start merging it with the database, or with recording - harmful cross-dependencies on the access queue will immediately follow.

Asynchronous recording at real-time logs, at the records archive, at the log archive, at the object database, at the mixer. Synchronous writing - at postgresql database, at dynamic object model attachments. Synchronous reading - for media files, for dynamic model objects.

File system and inode

For standard partitioning, e.g. ext4, it is important to keep in mind the limit on the number of inode. If many small files are placed on disk, inodes may terminate before free space is available. Each directory, each file occupies one unit. For example, if the disk stores the history of email messages - then regardless of the size of each individual email, it will actually consume more than 100kb per email due to multiple directories and the email’s raw data file.

The platform includes inode tracking on the disks where its partitions reside. When thresholds are monitored, the administrator will receive a message as part of the system_state.