Device Fingerprinting Explainer: Why It Matters for Web Scraping (2024)

Device Fingerprinting Explainer: Why It Matters for Web Scraping (1)

What is device fingerprinting, how it is used and why it’s important for web scraping?

Websites use device fingerprinting to identify devices using a combination of attributes provided by the device itself. This includes web browser and device configuration. In simple terms, it’s a unique device ID of sorts. The parameters collected to build the device fingerprint depend on the used solution, but typically the most common are:

  • operating system
  • screen size and resolution
  • user agent
  • system language and system country
  • device orientation
  • battery level
  • installed fonts and installed plugins
  • system uptime
  • IP address
  • HTTP request headers.

Most of these parameters are read from the browser settings. That means we can also use the more popular term “browser fingerprinting” with the same connotation.

Want to test which machine features are leaked from your browser just by browsing a web page? Visit Browserleaks online test to check with your eyes, simply with a Javascript executed on the server.

Also consider that most of the common anti-bot solutions use this basic information. They add some more complex test results, like Canvas and WebGL fingerprinting, to add even more details to these fingerprints.

Here’s the point: the more pieces I add to the fingerprint, the more granular it becomes. That’s because it’s less likely for two users to have the exact same device configuration. This way I can abstract a small niche of users with one fingerprint and track their behavior, without using cookies.

This is key in these times when cookies are under the scrutiny of GDPR, CCPA, and other internet regulations. Users are getting more aware of them, deciding to opt-out or wipe them from their machines. However, fingerprinting is not only a matter of marketing. It’s also anti-fraud and anti-bot areas in general that are involved in developing this kind of tech.

In fact, detecting fingerprints that contain some incompatible data or outliers in configurations, can raise some red flags in the traffic.

How a device fingerprint is collected?

As we have seen, a fingerprint is a collection of single pieces of information collected when a device connects to a server.

Depending on the method used for collecting this information, we can divide them by active and passive device fingerprinting work techniques.

  • Active fingerprinting is when a server interacts with the device, using a challenge that the browser needs to solve. An example can be the Canvas fingerprinting or WebGL fingerprinting technique. In both cases, an image with a text overlay is rendered off-screen. On different hardware, this image and its result hash string are rendered differently, so we have different fingerprints for different hardware.
  • Passive fingerprinting is when the server simply gathers the information passed by the browser and the different HTTP connection layers. These include IP address, request headers, user agent, screen resolution, operating system, and so on.

Modern anti-bot softwares combines both these families of techniques. They integrate them with behavioral analysis and AI to detect incongruences in the settings of the device and the scraper.

Mobile device fingerprinting is no different that usual, but it’s able to collect extra data points: battery level, GPS data, cell provider, installed apps and so on.

If you’re curious about the number and the type of information that can be gathered via your web browser, you can have a look at deviceinfo.me.It’s an online test where you can discover all these details.

As you can see from the following images, the description of your device is quite accurate.

Device Fingerprinting Explainer: Why It Matters for Web Scraping (2)Device Fingerprinting Explainer: Why It Matters for Web Scraping (3)

Device Fingerprinting and privacy concerns

As we mentioned before, device fingerprinting solutions are gaining traction not only in the anti-bot industry. They work for marketing as well, since more and more people are concerned about cookie usage by websites. This technology is also widely used in banking for preventing fraud, especially credit card fraud.

Since it’s possible to accept or decline the cookie usage of websites and they can also be deleted, their efficiency for marketing purposes is declining. Browser fingerprinting stays an undeniable way to track user’s device information, and it’s hard to deal with – at least without special software.

You might be using a fingerprinting tool in your online marketing solution to identify and track users and then process that data with machine learning algorithms to later sell it to an online advertising firm. This might give a bit less granular detail than common users’ cookie collection. But the other side is – there’s no way to opt out from fingerprinting (yet).

Device Fingerprinting Explainer: Why It Matters for Web Scraping (5)

Very specific clusters of customers can be created. For example, it can be “people from city X using the latest Mac laptop model, with a second screen and graphics card, browsing via Chrome v. 113 with Y extension installed, and connecting with ISP Z”. The variety of data points collected is incredibly wide.

It is such a detailed description that the European think tank about online privacy called Article 29 Data Protection working party, expressed its opinion about fingerprinting technologies and European personal data protection laws.

To make a long story short: collecting all these pieces of information from the browser to create a unique digital fingerprint makes the ePrivacy directive applicable to this technology.

This implies that visitors of a website, just like what happens for cookies, should be informed if there are any fingerprinting techniques used on the website, unless they are meant only to make the website work correctly.

How to mask your fingerprint

Having a look at the deviceinfo.me website, we can notice the several layers of information gathered to create a fingerprint.

Connection layer

A TLS fingerprint is created using the handshake packets that client and server exchange before establishing an HTTP connection, as seen in our previous article dedicated to TLS fingerprinting. It’s a quite common technique used by major anti-bot provider, as we can see from this Cloudflare article.

To avoid raising red flags when using a web scraper, you can use real browsers with Playwright or Selenium. These will use ciphers not in blacklist, or change ciphers in your Scrapy project’s settings.

Of course, the server knows also the IP address of the device who is connecting and can derive from it several additional info, like your country, state, ISP and so on.

We can change all these details by using a proxy provider and depending from your needs, you can use datacenter, residential or mobile IPs.

Browser layer fingerprint

Most of the other information used to fingerprint you are coming from your browser settings and how the browser reacts to Canvas and WebGL active fingerprinting.

Typically, if you’re trying to scrape a website protected by a modern anti-bot solution, you cannot use solutions like Scrapy, but you’ll need to use a webdriver, real or headless browser versions to bypass the protection.

In these cases, provide a plausible machine setup with no discrepancies in the settings. It is key to create a fingerprint that seems as legit as possible and extract data from a website effectively.

As an example, using Playwright and Chrome in headless mode should trigger some red flags, since it’s easy to detect. We can see exactly that from deviceinfo.me screenshots.

While in headful mode seems perfectly legit, while much more computing intensive.

Another example, connections from a server machine won’t show any microphone and camera.

But if you use anti-detect browsers like GoLogin, that allow you to create custom profiles the mimic different hardware and OS setups, you can send more plausible information to the server, creating a more legit fingerprint.

Final remarks on Device Fingerprinting

We have seen how many information are transmitted together with a simple browser connection to a website, and also how they can be used to both track users’ behavior and detect bots.

All these device fingerprinting techniques can raise some concerns about privacy, especially in countries where these information can be used to limit private freedom.

Luckily, we have several tools in our toolbelt to use to mask our real online traces. Some of them can be used also for our web scraping projects, as well as general private browsing.

In the next post of The Web Scraping Club we’ll make some examples with code, where we’ll mask our device fingerprint to bypass the most common anti-bot solutions.

This article was kindly provided by Pierluigi Vinciguerra, web scraping expert and founder of Web Scraping Club. Follow this link to see the original post.

Download GoLogin here and explore the scraping world with our free plan!

References:

  1. Xu Q. et al. Device fingerprinting in wireless networks: Challenges and opportunities //IEEE Communications Surveys & Tutorials. – 2015. – Т. 18. – №. 1. – С. 94-104.
  2. Kohno T., Broido A., Claffy K. C. Remote physical device fingerprinting //IEEE Transactions on Dependable and Secure Computing. – 2005. – Т. 2. – №. 2. – С. 93-108.
  3. Nikiforakis N. et al. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting //2013 IEEE Symposium on Security and Privacy. – IEEE, 2013. – С. 541-555.
  4. Szmulewicz D. J. et al. CANVAS an update: clinical presentation, investigation and management //Journal of vestibular research. – 2014. – Т. 24. – №. 5-6. – С. 465-474.
Device Fingerprinting Explainer: Why It Matters for Web Scraping (2024)

FAQs

Device Fingerprinting Explainer: Why It Matters for Web Scraping? ›

In a web analytics context, device fingerprinting is used to accurately identify and report on unique (i.e. returning) visitors. Advertisers and AdTech vendors also use device fingerprinting to identify and track users across the Internet, which allows them to create user profiles and target them with personalized ads.

Why is device fingerprinting important? ›

Device fingerprinting plays a pivotal role in fraud prevention and security. By tracking the unique attributes of a device, businesses can identify and flag suspicious activities. If a user's device suddenly exhibits drastically different characteristics, it could indicate a potential security breach.

How effective is browser fingerprinting? ›

The effectiveness of browser fingerprinting lies in its ability to generate a high entropy identifier - meaning, the collected data points create a sufficiently complex and unique profile that distinguishes one user from millions of others online.

What is the problem with device fingerprinting? ›

Cons: Data privacy concerns

Browser fingerprinting violates current privacy regulations as this way the users are not aware of the amount of their data transferred and who is getting hold of it. Users cannot simply clear their fingerprints like cookies, which poses concern even among the most privacy-conscious users.

How does the device fingerprint help in protecting private information? ›

Device fingerprinting helps protect private information by creating a unique identifier for each device that accesses a website or application. This identifier is based on a variety of factors, such as the device type, operating system, browser type, and IP address.

Why is fingerprinting important? ›

One of the most important uses for fingerprints is to help investigators link one crime scene to another involving the same person. Fingerprint identification also helps investigators to track a criminal's record, their previous arrests and convictions, to aid in sentencing, probation, parole and pardoning decisions.

What is the advantage of using fingerprints? ›

Benefits of fingerprint biometrics

Fingerprints cannot be lost or misplaced, and they are always with the person. Fingerprints are hard to fake and more secure than a password or token. Fingerprint patterns cannot be guessed and are non-transferable.

What data does browser fingerprinting collect? ›

Given that a device with a less-than-common setup can quickly be detected on the internet, privacy concerns become another rising issue. A browser fingerprint is a compilation of several user-device data, including hardware, operating system, browser, and configuration [7].

Is device fingerprinting reliable? ›

By identifying unique device characteristics, device fingerprinting is an effective fraud prevention method.

How effective is fingerprinting? ›

Studies Show Fingerprint Analysis Is Not 100 Percent Accurate. While people may believe that everyone has a unique fingerprint, this has never been proven, and statistical analyses have not been able to determine the probability that multiple people may have the same fingerprints.

What is the biggest problem with fingerprint evidence? ›

Fingerprints Are Not Secure

This is one reason fingerprints have been easily planted in the scenes of a crime. People with grievances against others can decide to extract their fingerprints and plant them in the scenes of crime. This is a major flaw of fingerprint identification that has been noticed for years.

What can hackers do with your fingerprint? ›

Financial fraud from digital wallets and online banking. Hackers can use your fingerprints to unlock digital wallets or access credit card and bank account details.

What are the negatives of fingerprinting? ›

Disadvantages of Fingerprint Scanners

Despite being a more secure method of authentication, fingerprint scanners are not entirely impervious to breaches. High-quality replicas of an individual's fingerprints can potentially deceive these scanners.

What are the benefits of device fingerprinting? ›

Cross-Device Tracking: Businesses can use device fingerprinting to track user behaviour across multiple devices which allow for more targeted and effective advertising campaigns. No Cookies Required: Device fingerprinting can work even when users disable cookies in their browsers, unlike traditional tracking methods.

Is browser fingerprinting legal? ›

While it is a legal practice, you may need to acknowledge every digital fingerprint in your terms and conditions, as some visitors may wish to opt out (like with a cookie policy).

How does a device fingerprint work? ›

A device fingerprint - or device fingerprinting - is a method to identify a device using a combination of attributes provided by the device configuration and how the device is used. The attributes collected as data to build the device fingerprint can vary depending on who is building the fingerprint.

Why is fingerprint recognition important? ›

Fingerprint biometrics can be used to authenticate a person based on matching the data within a system, or it can be used as a method of identity verification to ensure that a person is who they say they are. Fingerprint biometrics can add an extra layer of security over password and token security measures.

What are the benefits of fingerprint machine? ›

Fingerprint Scanner Advantages

Fingerprint scanners considerably ease the login process as a smooth, time-efficient authentication solution. Users simply need to touch their finger on the scanner for authentication, which takes seconds – a significant improvement over entering passwords or PINs numerous times each day.

Why fingerprint is essential for the security system? ›

Fingerprint technology serves as one of the most commonly employed forms of biometric access control, widely utilized by mobile users to unlock their devices. Unique characteristics inherent in fingerprints provide secure and accurate verification for access control.

Why is DNA fingerprinting important and useful? ›

But DNA fingerprinting is an important part of forensic science. Although it can't really tell you exactly who committed a crime, it can be used to help narrow down a list of suspects based on how well their DNA matches the samples that were found at the crime scene.

Top Articles
Latest Posts
Article information

Author: Reed Wilderman

Last Updated:

Views: 6599

Rating: 4.1 / 5 (52 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Reed Wilderman

Birthday: 1992-06-14

Address: 998 Estell Village, Lake Oscarberg, SD 48713-6877

Phone: +21813267449721

Job: Technology Engineer

Hobby: Swimming, Do it yourself, Beekeeping, Lapidary, Cosplaying, Hiking, Graffiti

Introduction: My name is Reed Wilderman, I am a faithful, bright, lucky, adventurous, lively, rich, vast person who loves writing and wants to share my knowledge and understanding with you.