Our infrastucture is deployed in AWS us-east-1 region. Elastic Load Balancer (ELB) is used for API endpoints. CNAME records were created in CloudFlare DNS to point to ELB instance.
We used Datadog Synthetic Monitoring to test our endpoints. Synthetics Monitoring hits an endpoint periodically from around the world. We started getting a lot of alerts from Synthetic Monitoring due to DNS timeouts (which defaults to 2s in Synthetic Monitoring). After dashboarding the DNS timing for the Synthetic Monitoring requests, we noticed that avg DNS resolution time was indeed pretty spikey. We then decided to move our DNS to AWS Route 53. This seems to have reduced DNS resolution times significantly and also ended the constant alerts we were getting from Synthetic Monitoring due to DNS timeout.
While Datadog Synthetic Monitoring does seem to use AWS itself for for pinging the endpoints, we didn't expect the DNS timing change to be so dramatic.
After a few weeks of staying on Route 53, we decided to again check if Cloudflare still had the DNS issues or if that was a one time thing. So we moved our DNS back to Cloudflare from Route53. Unfortunately, the spikes in the graph returned.
Given the above results, Cloudflare doesn't seem like a feasible DNS solution when using AWS for your infrastructure. It greatly increases DNS resolution latency compared to AWS Route 53.
P.S. - We are hiring engineers!