Every company aspires to provide fast and available services at all times, without latency. Although this may seem simple, achieving this goal requires a substantial investment in time and resources. Wellhub has decided to take a decisive step by adopting a multi-regional architecture for its autocomplete service, using cutting-edge technologies like Elasticsearch to optimize the search experience. This strategic choice has led to a significant improvement in latency, transforming the way users interact with the platform. In this quest for efficiency, several techniques and innovations have been implemented to ensure a quick and relevant response, even in a global environment.
In this article, we explore how Wellhub has invested in a multi-regional architecture to offer a low-latency autocomplete service. This solution, developed in Go, uses Elasticsearch to predict user inputs and refine the relevance of search results by taking geolocation into account. By relying on tools like AWS Global Accelerator, the company has optimized traffic routing, ensuring low-latency connections. Moreover, a data replication strategy has been implemented to allow the restoration of backups in different regions, thus meeting data refresh requirements. Finally, to further enhance perceived performance, Wellhub introduced a pre-fetch access point, thereby enhancing the user experience while reducing the burden on the main infrastructure.
Table of Contents
ToggleImproving Wellhub’s Autocomplete Service Latency
Wellhub’s autocomplete service relies on a multi-regional architecture designed to maximize speed and responsiveness. With an initial latency reaching up to 600 ms, it was essential to implement optimizations to enhance the user experience. By utilizing AWS and integrating Elasticsearch, we succeeded in reducing this delay to acceptable levels.
Strategies Implemented
To reduce latency, several strategies have been adopted. First, data replication across different regions allows for bringing information closer to users, thus minimizing transfer delays. Geo-localized queries are also part of this approach, improving the relevance of results based on location. These methods have been coupled with intelligent pre-loading of data subsets, allowing for almost instant suggestions.
Results Achieved
After implementing these improvements, the performance of the service has increased significantly. Query latency has dropped to about 123.3 ms on iOS and 134.6 ms on Android, making the system much more efficient. Additionally, this architecture has enabled optimized use of mobile networks, with a notable reduction in Wi-Fi usage, proving that users are benefiting more from mobile services than before.