The Due Diligence Co-Pilot (DDCP) bot is designed to provide internal information at Capria regarding generic company details and legal aspects relevant to potential portfolio companies. A health check system was implemented to ensure the DDCP was functioning correctly and performing optimally. This system is designed to automatically run periodically, specifically every 6 to 12 hours, on the Amazon Web Services (AWS) instance hosting the DDCP bot.
The Health Check Process
The health check process involves several automated actions:
- Login Simulation: The system simulates a user login to the DDCP bot.
- Query Testing: It then attempts to ask a question, simulating a typical user interaction with the bot.
- Functionality Check: The system checks all the functionalities of the DDCP bot to ensure they are working as expected.
- User Tracking: It monitors the number of user accounts that have been added or deleted.
- Report Generation: Finally, the health check system compiles a report detailing the results of these checks and sends it to the administrators via email.
The Problem Encountered
During one of these routine health checks, a significant anomaly was detected. The health check reported that there were 446 user IDs associated with the DDCP bot. This was significantly higher than the expected number, which was around 300. This discrepancy raised concerns, as it suggested that something unusual was happening.
The Suspected Cause: A Bot Attack
The large, unexpected increase in user IDs led to the suspicion that the DDCP application(website) was experiencing a bot attack. A bot attack involves automated programs (bots) attempting to interact with and potentially gain unauthorized access to the system. In this case, it seemed likely that bots were trying to create accounts or interact with the DDCP bot in some way, leading to the inflation of the user ID count.
The Solution: Implementing Captcha
To counteract the suspected bot attack, a security measure was implemented: a Captcha system was added to the login process of the DDCP bot. Captcha, which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart,” is designed to differentiate between human users and automated bots.
How the Solution Works
- Captcha for General Users: When a regular user attempts to log in to the DDCP bot, they are now required to complete a Captcha challenge. This typically involves tasks that are easy for humans but difficult for bots, such as identifying images or solving simple puzzles.
- Bypassing Captcha for Health Check Bot: To ensure that the health check system can still function automatically, it was given a way to bypass the Captcha. This is achieved by using a specific combination of credentials: a particular email address, password, and IP address. The health check bot is programmed to use these specific credentials, which are recognized by the system as belonging to the legitimate health check.
- Blocking Unauthorized Bots: Any other bots trying to access the DDCP bot with the same credentials will fail because their IP addresses will be different from the IP address associated with the health check bot. The system checks to see if the IP address requesting access is the same IP address of the health check and only allows it to bypass the Captcha with that IP address and credentials.
In summary, the DDCP health check revealed a potential bot attack indicated by an inflated user ID count. The solution involved adding Captcha to block unauthorized access while allowing the health check system to continue operating by using a recognized set of credentials and IP addresses.
