# SSO Privacy Code Readme

# Preconditions
1. Ubuntu 22.04 LTS
2. Python installed (recommended version >= 3.10) (`apt install python3`)
3. Pip installed (recommended version >= 24.1) (`apt install python3-pip`)
4. pipenv (reccommended version >= 2024.0.1 (`pip3 install pipenv`)

**Please note:** When installing packages with `pip` the following warning may occour:
```
WARNING: The script virtualenv is installed in '/home/<username>/.local/bin' which is not on PATH.
```
If this is the case, please consider adding the given path to your PATH variable. Otherwise, commands installed via `pip` must be called with the complete path to the executable like `/home/<username>/.local/bin/pipenv`. 

To modify your PATH variable in a fresh Ubuntu 22.04 LTS installation to include the correct path you can simply place `PATH=$PATH:~/.local/bin` at the end of the file `~/.profile`. Afterwards, please reboot the system. If you are unable to reboot the system, you can also temporarily add `~/.local/bin` to your PATH variable by running the following command: `export PATH=$PATH:~/.local/bin`. Note that if you use this technique, you must do this each time you close and reopen the terminal. Therefore, it is recommended to use the `~/.profile` file.

# How to setup SSO-Monitor
1. Download the SSO-Monitor tool from our [website](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Code/SSO-Monitor.zip) and extract it. It already includes all of our developed extensions (see the `sso-monitor-extenstions` folder [here](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Code/sso-monitor-extensions/)) 
2. Install dependencies and setup playwright
   1. Open a terminal, switch to the directory `sso-monitor/app` and execute the following commands
      1. `pipenv install`
      2. `pipenv run playwright install-deps`
      3. `pipenv run playwright install` 

# Run the scans
In the following process you have multiple options. Scanning for partial leaks on one specific website (1.1), scanning for partial leaks on a range of the tranco list (1.2), and scanning for full leaks (2). Please note that scanning for full leaks requires a previous partial leak scan and the execution of the evaluation script as described in [2 - Evaluation on the website](https://sso-privacy-leak.info/#evaluation). As an alternative, you can use the results (`exported_lreq.csv`) [from our analysis](https://files.sso-privacy-leak.info/share/W6rrqEnb/2-Evaluation/Partial%20Leaks/Data%20Tranco%20Top%201M/exported_lreq.csv) to reproduce our full leak scan results.


1. To run a scan for partial leaks, you can use one of the following example run configurations

   
   1. Single website: `run-configurations/partial-leaks/single-page-configuration.json` ([see here](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Code/run-configurations/partial-leaks/single-page-configuration.json))

      This configuration is used to scan for partial privacy leaks on a specified website. 
      * `tranco_id` The tranco id of the domain in your used tranco list. This id is not important for the scanning process. However, if not present, the evaluation step later on will fail!
      * `page` The website to scan.
      * `rest_time_on_page` How long should the browser remain idle after loading the page before performing any actions or closing the page.
      * `simulate_interactions` Should the browser simulate user interactions.
      * `cookie_banner_procedure` Should an extension be loaded to accept cookies. **Important**: If you want to use this option, you have to extract the extension from `cookieblock-v1.1.1_0.zip` ([see here](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Code/cookieblock-v1.1.1_0.zip)) to the folder `sso-monitor/app/modules/browser/extensions/chromium/` and add `"chromium/cookieblock-v1.1.1_0"` (with the quotes) to the array at `browser_config -> extensions` of the json run configuration file.
      * `artifacts_config` Which artifacts should be stored in the result file.
      * `browser_config` Configuration of the scanning browser instance. For example, if you want to see the scan process, you can set the `headless` flag to `false`. Note: If you do not have a desktop environment, this option must be `true`. The set configuration match our default test setup.

      
      To start scanning a page, you can use the following command:

      `pipenv run python cli.py -o /tmp/scanning-results-partial-leaks privacy --analysis-config <path-to-the-config-file> --thread-count 1 --max-run-time 300`
      * `-o` The output directory for results
      * `--analysis-config` Path to the configuration path for one specific page
      * `--thread-count` How many threads to use to run scans in parallel
      * `--max-run-time` Time in seconds after which a task is aborted  

      An example for one file can be found inside our [partial leak artifacts](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Partial-Leak-Scans/partial-leak-scan-data-top-1m/), for example [12-twitter.com.json](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Partial-Leak-Scans/partial-leak-scan-data-top-1m/1-50000/12-twitter.com.json).

   2. Tranco range: `run-configurations/partial-leaks/tranco-list-configuration.json` ([see here](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Code/run-configurations/partial-leaks/tranco-list-configuration.json))

      This configuration can be used to scan a range of tranco domains. 
      * `scan_config` Define which tranco list to use and which range to process
      * `analysis_config` Please refere to the `single-page-configuration.json` 

      To start scanning a tranco list domain range, you can use the following command:

      `pipenv run python cli.py -o /tmp/scanning-results-partial-leaks privacy --scan-config <path-to-the-config-file> --thread-count 3 --max-run-time 300`
      * `-o` The output directory for results
      * `--scan-config` Path to the configuration path for a range of tranco domains
      * `--thread-count` How many threads should be used to execute the scans parallel
      * `--max-run-time` Time in seconds after that a task will be canceled 

      Our results for all 1 million websites can be downloaded here: [partial-leak-scan-data-top-1m.zip](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Partial-Leak-Scans/partial-leak-scan-data-top-1m.zip)

   **Result Information:**  
   The results are json files containing the following information:
   * `analysis_config`: This field contains all the settings used for the scan request, e.g. the content of `single-page-configuration.json` or the `analysis_config` from `tranco-list-configuration.json`.
   * `analysis_result`: This field contains the results of the scan process:
      * `page`: The page that was scanned.
      * `found_sso_requests`: If an sso leak was found (in the form of an HTTP message or post message), it will be added to this array. Each request is documented by the fields `idp` (string), `lreq` (string), and `user_action_performed` (boolean). `idp` contains the identity provider for which the leak occurred, `lreq` contains the request itself, that triggered the detection and `user_action_performed` holds information about whether the leak occurred before or after the simulated user actions were performed. If a page does not trigger any leaks, this array is empty.
      * `thrown_exception`: If the scan did not complete normally, this field gives information about the underlying problem. It contains the exception that was thrown before the scan aborted. If this field is `null`, the scan was successful.
      * `duration`: The time in seconds it took to complete the scan.
      * `screenshot_zlib_compressed` and `har_zlib_compressed`: Depending on the analysis configuration (`artifacts_config`), these fields contain the screenshot and / or the HAR file of the scan. If no artifacts were saved due to the configuration settings, these fields are `null`   

2. To run a scan for full leaks, you have to run the evaluation script for partial leaks first as described in [2 - Evaluation on the website](https://sso-privacy-leak.info/#evaluation) or use the [results from our analysis](https://files.sso-privacy-leak.info/share/W6rrqEnb/2-Evaluation/Partial%20Leaks/Data%20Tranco%20Top%201M/exported_lreq.csv). With this result (`exported_lreqs.csv`) you can then start the so called `privacy_trace` task.

   
   1. **Giving consent**: 
      This step is mandatory when scanning for full leaks. However, as described in Section 5.2, we have also checked that identity providers do not leak sensitive tokens without user consent. If you want to verify this yourself, you can skip this step. The result will be a csv without content.

      To rearrange our testing setup, the first step will be to give consent to all found partial site leaks. Unfortunately, we can not provide a logged in profile for all our found partial leak sites. Therefore, this process must be done manually. This can be done in your normal browser. Look into your `exported_lreq.csv` and go to each website. Then, find the button that allows you to sign in to one of the providers you want to analyze (in our analysis we used Facebook, Google, Microsoft, and Newscorp Australia) and complete the sign in process. This includes the following steps:
      1. Logging into the identity provider (if you're not already logged in)
      2. Confirming the consent
      3. Being successfully redirected back to the relying party. 

      Unfortunately, we can not provide anything to speed up this process. We did this manually to also document the numbers of misconfigurations, websites not providing social logins but including SDKs, and other problems as described in section 5.2.

   2. **Create a logged in browser profile**: You can simply execute the following command inside the directory `sso-monitor/app`, log in into the IdP and press Enter in the terminal after everything finished:

      `PYTHONPATH=$PYTHONPATH:$(pwd) bash -c 'pipenv run python ./tasks/privacy_trace.py'`

      This will create the profile at the path `/tmp/browser_profile`
   3. **Run the full leak scan**: You can use the example configuration file at `run-configurations/full-leaks/scan-configuration.json` ([see here](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Code/run-configurations/full-leaks/scan-configuration.json))
         * `scan_configuration` Definition of the result csv file of the evaluation script for partial leaks (see [2 - Evaluation](https://files.sso-privacy-leak.info/share/W6rrqEnb/2-Evaluation/Evaluation%20Scripts/README.md)), the path to the created browser profile and which identity provider should be scanned
         * `analysis_config` Please refere to the info for single page analysis above (1.1)

         Update all parameters (e.g. set the IdP to be tested to `true` and set the correct paths) before running your analyses. Note that the execution must be done separately for each IdP. Setting multiple IdPs to true will result in errors and false negatives.

      
      To start the scanning, you can use the following command:

      `pipenv run python cli.py -o /tmp/scanning-results-full-leaks privacy_trace --scan-config <path-to-the-config-file> --thread-count 3 --max-run-time 300`

      The results from our scan for full leaks can be found in our Articats [see Full-Leak-Scans](https://files.sso-privacy-leak.info/share/W6rrqEnb/1-Scanning/Full-Leak-Scans/).

      **Please note:** The detection of escalated leaks do not require an additional scan. All escalated leaks can be detected by the recorded full leak scan artifacts. This is done in the [Full & Escalation Leak Evaluation](https://files.sso-privacy-leak.info/api/public/dl/W6rrqEnb/2-Evaluation/Evaluation%20Scripts/README.md?inline=true#full--escalation-leak-evaluation) evaluation part.

   **Result Information:**
   Like the partial leak scans, the results are json files.
   * `analysis_config`: This field contains all the settings used for the scan request, e.g. the content of the `run-configurations/full-leaks/scan-configuration.json` file.
   * `analysis_result`: This field contains the results of the scan process:
      * `found_lreq`: This field contains the information about the partial leak found in the partial leak scan (see `found_sso_requests` array in 1). Please refer to the result information in 1.
      * `full_leak`: The authentication response from the identity provider to the relying party that triggered the detection of a full leak. 
      * `full_leak_type`: There are several types of full leaks. This field indicates what type of full leak was found. Possible values are: 
         * `AUTH RESPONSE` for a full leak via an HTTP response.
         * `FB-AR HEADER` for a full leak from Facebook that was sent over a non-standardized channel as an FB-AR header.
         * `FB-S` for a full leak from Facebook, the same as the FB-AR header, but containing only information about the user's current login state.
         * `GOOGLE POST MESSAGE AUTH RESULT` or `GOOGLE POST MESSAGE RESPONSE` for a full leak from Google sent via a non-standard post message channel.
         * `POST MESSAGE RESPONSE` for a full leak sent via generic post messages.
         * `FRAGMENT REDIRECT` for a full leak sent as a fragment redirect.
         * `FORM POST` for a full leak via a form post.
      * `duration`: The time in seconds it took to complete the scan.
      * `screenshot` and `har`: Depending on the analysis configuration (`artifacts_config`), these fields contain the screenshot and / or the HAR file of the scan. If no artifacts were saved due to the configuration settings, these fields are `null`. The data is compressed and base64 encoded.
      * `errors_while_running`: Unlike the partial leak scanning, this scan will not abort if an exception is thrown. Instead, any errors that occur during the scan are stored in this array.