TABLE OF CONTENTS
- Replication Database Purpose
- Overview of Provisioning
- What is covered with your DB replica environment
- What can I do to improve our reporting and queries?
- What does MaxContact do to maintain the health of Replicated servers
Replication Database Purpose
The primary purpose of the replication database is to allow customers to retrieve information from the system without impacting the live infrastructure. An instance of this is having an on-site MSSQL server that retrieves data every 15 minutes, enabling the creation of reports in SSRS or PowerBI.
Overview of Provisioning
A replication database server will be part of our HA (High Availability) cluster and as result of this, the replicated database will only have read access. It will also have reduction in permissions such as no admin access to create custom views and some other restrictions listed below. Below are the reasons why you should only pull data from the replication DB.
Custom Queries
Custom queries can be a significant challenge and have a negative impact on database performance. The impact of these queries can lead to uncertainties and make it difficult to create optimal maintenance or optimize queries that we haven't written. Poorly written queries can also have a detrimental impact on the overall performance of the database.
Database Size
This issue is related to the previous point. As your database grows over time, queries that used to run quickly may begin to take longer or stop responding altogether. This is largely due to the need for ongoing optimization and efficient query writing, which is an essential part of general DBA work. However, we cannot be held responsible for custom queries written by customers, as we simply provide the data to be accessed. We suggest that you maintain your queries to ensure they are always running efficiently. As datasets grow it’s no longer efficient to return “all” or “lots” of data with one query. You should always write queries to use WHERE clauses to limit the data returned. If you do need to return a large dataset, then it should be run in portions.
Cost (Resource & Capacity)
Monitoring replicated servers for custom development per site is not a practical solution. For large databases (100GB+), implementing effective optimization based on criteria outside of our control is not feasible.
Indexes
Indexes are a technology that allows us to sort and find data faster. An index allows quicker search results when execute queries. It’s not possible to index everything as this causes a negative effect on the database and infrastructure.
Our focus is always on optimizing databases for the MaxContact solution. . In a replicated environment, the indexes are the same as live environment. It's not feasible to add custom indexes to the live environment to cater to specific requests, as this can significantly impact the live performance of the production environment.
Standardisation
Standardization of the setup and configuration is essential for each database high availability cluster. This approach enhances consistency in change control and release management, leading to a reduction in incidents caused by changes. Having a few customers who operate differently from the majority can potentially result in unforeseen issues. Therefore, we're standardizing the offerings across the ecosystem to ensure the desired level of service and mitigate risks as part of our risk management strategy.
What is covered with your DB replica environment
Despite the challenges, the replication database ultimately exists to provide you with comprehensive information, empowering your business to achieve your desired outcomes from a management information standpoint. We're committed to assisting you in finding the optimal solution and can even assist with setup. There are various methods available for bringing the data into your environment, and we'll provide guidance on selecting the most effective approach. Key points to note:
- Near real-time replication of data with only a few seconds delay.
- MaxContact will support the data to the replica and ensure it is up to date.
- All long running queries will be automatically terminated.
- All high-cost queries will be automatically terminated, this is generally poor written queries.
- The purpose of the Database access is to pull data down.
- The server will be optimised to perform its dedicated task only o This includes indexes and Vacuum jobs o Server performance with RAM and Disc optimisation o NOT for custom queries.
- You will be able to write custom queries against the replica, but you won’t be able to run a reporting suite from it.
- You can connect by all normal database protocols ODBC, various ETL tools, PGAdmin and many more.
To ensure secure data transfer and protect replication traffic from interception or tampering, all new clients connecting to the replication server are required to use SSL (Secure Sockets Layer).
SSL Connection Requirement
Connection to the replication server must be SSL. Set the SSLmode to require.
How to Connect Using SSL
Clients must connect using SSL-enabled PostgreSQL drivers or tools, specifying the appropriate sslmode (require or verify-full) in their connection string.
Example PostgreSQL Connection String (with SSL):
Require:
host=replication.example.com port=5432 dbname=your_db user=your_user password=your_password sslmode=require
Verify-Full:
host=replication.example.com port=5432 dbname=your_db user=your_user password=your_password sslmode=verify-full sslrootcert=system
Recommended sslmode values:
require — SSL encryption is enforced, but the server certificate is not validated.
verify-full — SSL is used, and the certificate must be signed by a trusted Certificate Authority, and the server hostname must match the certificate.
We recommend using verify-full in production environments for maximum security.
Ensure your application supports and is configured for SSL connections using PostgreSQL drivers such as libpq, JDBC, or PgBouncer (with SSL enabled).
What can I do to improve our reporting and queries?
- For customers with advanced reporting needs, take a copy of the data and store it in your own warehouse or other technology. Ensuring that you use the replication database to pull delta datasets rather than requesting all data each time.
- Run queries that you expect to return large datasets in the evening (our replicator databases usually carry less load between 8PM and 7AM)
- Use WHERE clauses on your queries on the partitioned and indexed columns.
- Limit the data you wish to see, Use “SELECT column1, column2” over “SELECT *”
- If you wish to see a sample set of data before writing your queries use the LIMIT statement. This will return a small sample (use define) e.g. SELECT column1, column2 FROM table LIMIT(10)
- Please check and make sure the cost of the query is low by using “EXPLAIN” statements.
- Try not to hit the same table multiple times.
- If more than one table required to extract the data, please use “INNER JOIN” to join the tables.
What does MaxContact do to maintain the health of Replicated servers
- We continually monitor the health of our replicated databases and servers to ensure there is a balance of available resources vs the demand.
- We sample individual queries being run, on occasions we may feedback directly to customers, however this is not feasible as a routine operation.