When we talk about fraud on a website, we tend to think directly about online payments. However, there are several possible fraudulent uses of a website that are all equally dangerous and that cause huge losses to web professionals. These frauds are now automated through robots called bots (malicious) that can perform several actions on a website.
Content theft is the most common fraud today.
According to the latest estimates, each year the theft of content would amount to about 3 billion dollars. Indeed, the content of a website is what makes its reputation. Moreover, with the digitalization of society, content creation has become a real business. The sale of information is now digitalized, which greatly exposes companies specialized in this field to the risk of scrapping on their platforms. For example, sites that continuously publish commodity prices, legal news sites, or platforms that offer detailed analysis of financial markets are easy targets. In short, all those whose main activity is content, which is why they want to preserve their intellectual property at all costs. This asset is sometimes as easy to steal as to make a “copy and paste”.
This kind of fraud has very harmful consequences that are not visible at first glance. Indeed, apart from the fact that the stolen content is used by another to generally overshadow the site and thus recover its customers (visitors), the reputation of the site is tainted, and the visitors then become reluctant to entrust their personal information.
Spammers are bots that generally try to infiltrate the member area of a site. In order to gain access to the comment section or the forum of the site, to post messages or spread redirection links to another competing site, advertising links, or even links to malware or scams.
Infected traffic and distorted analysis:
Malicious bots are now programmed to crawl all kinds of sites making a phenomenal amount of requests, which not only hinders and slows down traffic, but also results in a huge loss of money at the same time. Since every request has a cost, the targeted site finances its own fraudsters. In other words, the processing of the traffic generated by these bots consumes additional resources in the cloud, hence the need to invest more in hosting solutions like Aws cloud or Azure.
In addition, the analysis of site traffic is generally used to improve the services offered and to better understand user expectations. In the case of traffic that is partially or mostly infected by malicious bots, the analysis may be wrong and the site will lose its attractiveness.
Credit Card Fraud:
Credit card fraud is the scourge of the current century. The amount of losses caused by this type of fraud continues to grow. In 2019, it was estimated at more than $35 billion worldwide. This figure illustrates the fact that it is one of the most difficult to counter, and that it does not target a particular field. It is completely random, all web professionals who use online payments can be targeted at any time. All the more so as the profits that fraudsters make from it push them to be always more ingenious to achieve their ends.
How to fight online fraud:
The market of the fight against online fraud is in constant evolution. The companies concerned devote great importance and a consequent budget to it.
There are several techniques to fight online fraud and several tools to this effect that allow to detect online fraudsters and to banish them. The goal is the same, but the way to do it differs from one tool to another, even if all of them are based on one of these principles:
- Rules engine:
This technique consists in defining a set of rules that will determine if a user is fraudulent or not.
Rule-based models generate a final score by which the site will validate or ban the user in question.
This score is obtained by adding up the results of the stated rules. Let’s take this example: if the customer account is older than 6 months, it corresponds to -500 points, and +400 points if the IP address corresponds to a proxy server. In our case, the higher the score the more likely the user is to be a fraudster. In the end, the decision to ban or validate depends on the established point bar.
This type of model is particularly effective for simplistic fraud cases. That said, this system is rather easy to foil by fraudsters today who make sure they can comply with all possible rules to avoid detection.
In other words, rule engines seem to be insufficient in the face of the creativity of current fraudsters, hence the need to introduce new, more efficient models.
- Role of machine learning in the fight against fraud:
The use of machine learning in the fight against fraud has become a matter of course. Most online tools use it. Explanations:
This consists of training learning algorithms through a large amount of pre-processed data to generate models capable of predicting in real time whether a user is fraudulent or not.
These models are indeed a huge advantage over rule engines, as they identify hidden correlations in the data, and automatically detect possible fraud scenarios.
There are several steps during the development of a model based on ML, one of the most important is obviously the collection of data to train the algorithm.
There are many platforms that offer datasets to train algorithms. That being said, in order to detect fraudsters, the model must be customized to the target site. In other words, use a database specific to the site that includes several technical data (IP addresses, connection dates, requests, routes, etc.). The richer and more diversified the database, the more accurate the final model generated will be.
In short, the detectors available online have more or less the same operating scheme.
A client site subscribes to the tool, which must provide a log database containing the technical data of its visitors. The tool will then train a custom template to the client and deploy it so that the client site will be able to use it as an API for each visitor request.
This solution clearly relies on the data provided by the client site.
- The CloudFilt solution; or how to combine rules engines and ML:
It is clear that fraud detection is a real technological challenge.
Machine learning is certainly the most effective way to fight against this fraud, which is why the detection models generated must be optimized as much as possible and this obviously involves processing the database to train the model.
This way, a solution is offered by CloudFilt, a platform that protects against malicious bots.
Thanks to the analysis of customer requests in front and back, a relevant dataset is built by filtering; we keep only the useful data and likely to be discriminatory data between a fraudulent user and another who is not, to then enrich it through other technical data that represent the intention of the visitor as the use of a proxy server or Tor, also detecting the location of each user by the IP address and much more….
This dataset will be constantly updated to fill a maximum of possible scenarios.
To further enhance security and to anticipate scenarios that are undetectable or at least not yet detectable by our ML model, we opted for a combination of an engine and a rules engine based on visitor behavior analysis.
In other words, we analyze how visitors interact through several criteria:
- Duration of a session:
A session ends when the visitor remains inactive for a certain period of time. The final connection time will thus allow us to know if a visitor has stayed for a long time, which would correspond to inhuman behavior.
- Query speed:
In each session, a visitor makes a number of requests.
The speed of these queries is a good indicator of the user’s credibility.
- Average number of requests per page:
In a classic website, the path of a visitor is sometimes quite unpredictable. Nevertheless, a visitor performs a given number of requests on each page. Performing a large number of requests on a single page would then be considered as suspicious behavior.
- Similarity in login dates:
Bots are usually scheduled to log in at a specific time. CloudFilt then inspects whether there are any similarities between the start dates of the same visitor.
The combination of rule engines and ML models seems to give very promising results that allow to filter out a maximum of fraudsters without banning false positives. Thanks to direct access to logs, detection is becoming more and more sophisticated, covering many possible scenarios.