{"id":19921,"date":"2021-02-03T20:25:13","date_gmt":"2021-02-03T14:55:13","guid":{"rendered":"https:\/\/valeurbit.com\/blog\/?p=19921"},"modified":"2021-02-12T17:57:09","modified_gmt":"2021-02-12T12:27:09","slug":"googles-certificate-transparency-as-a-data-source-for-attack-prevention","status":"publish","type":"post","link":"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/","title":{"rendered":"Google&#8217;s Certificate Transparency As A Data Source For Attack Prevention"},"content":{"rendered":"\n<p>We&#8217;ve prepared a&nbsp;two-part&nbsp;translation of Ryan Sears&#8217;s article on handling&nbsp;<a href=\"https:\/\/www.certificate-transparency.org\/\">Google&#8217;s Certificate Transparency<\/a>&nbsp;logs&nbsp;.&nbsp;The first part gives an overview of the structure of the logs and provides a sample Python code for parsing records from these logs.&nbsp;The second part is devoted to obtaining all certificates from the available logs and setting up the Google BigQuery system for storing and organizing searches for the received data.<\/p>\n\n\n\n<p>Three years have passed since the original was written, and since then the number of available logs and, accordingly, entries in them has increased many times.&nbsp;It is all the more important to correctly approach the processing of logs if the goal is to maximize the amount of data received.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Part 1. Parsing Certificate Transparency Logs Like a Boss<\/h2>\n\n\n\n<p>During the development of our first project,&nbsp;<a href=\"https:\/\/phishfinder.io\/\">phisfinder<\/a>&nbsp;, I spent a lot of time thinking about the anatomy of phishing attacks and the data sources that would allow us to identify traces of upcoming phishing campaigns before they can cause any real damage.<\/p>\n\n\n\n<p>One of the sources we&#8217;ve integrated (and definitely one of the best) is Certificate Transparency Log (CTL), a project started by&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Ben_Laurie\">Ben Laurie<\/a>&nbsp;and&nbsp;Adam Langley&nbsp;at Google.&nbsp;Essentially, a CTL is a log containing an immutable list of certificates issued by a CA, which is stored in a Merkle tree, allowing each certificate to be cryptographically verified if necessary.<\/p>\n\n\n\n<p>To understand how much data we will have to deal with, let&#8217;s see how many entries are contained in each log from the list from the CTL website:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>import requests\nimport json\nimport locale\nlocale.setlocale(locale.LC_ALL, 'en_US')\n\nctl_log = requests.get(\n    'https:\/\/www.gstatic.com\/ct\/log_list\/log_list.json'\n).json()\n\ntotal_certs = 0\n\nhuman_format = lambda x: locale.format('%d', x, grouping=True)\n\nfor log in ctl_log&#91;'logs']:\n    log_url = log&#91;'url']\n    try:\n        log_info = requests.get(\n            'https:\/\/{}\/ct\/v1\/get-sth'.format(log_url),\n            timeout=3\n        ).json()\n        total_certs += int(log_info&#91;'tree_size'])\n    except:\n        continue\n\n    print(\"{} has {} certificates\".format(\n        log_url,\n        human_format(log_info&#91;'tree_size'])\n    ))\n\nprint(\"Total certs -&gt; {}\".format(human_format(total_certs)))<\/code><\/pre>\n\n\n\n<p>At the output we get:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ct.googleapis.com\/pilot has 92,224,404 certificates\nct.googleapis.com\/aviator has 46,466,472 certificates\nct1.digicert-ct.com\/log has 1,577,183 certificates\nct.googleapis.com\/rocketeer has 89,391,361 certificates\nct.ws.symantec.com has 3,562,198 certificates\nctlog.api.venafi.com has 94,797 certificates\nvega.ws.symantec.com has 200,401 certificates\nctserver.cnnic.cn has 5,081 certificates\nctlog.wosign.com has 1,387,492 certificates\nct.startssl.com has 293,374 certificates\nct.googleapis.com\/skydiver has 1,249,079 certificates\nct.googleapis.com\/icarus has 48,585,765 certificates\nTotal certs -&gt; 285,037,607<\/code><\/pre>\n\n\n\n<p>285,037,607 at the time of writing.&nbsp;This is not such a large amount of data, but you still have to make some effort to effectively organize storage and retrieval of certificates.&nbsp;More on this in the second part.Translator&#8217;s comment<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Anatomy CTL<\/h2>\n\n\n\n<p>Receiving records from CTL is done over HTTP, which will allow us to easily receive data using modern libraries.&nbsp;Unfortunately, the data in the records themselves are confusing binary structures, which complicates the parsing process somewhat.&nbsp;An example of a log entry:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><em>\/\/ curl -s 'https:\/\/ct1.digicert-ct.com\/log\/ct\/v1\/get-entries?start=0&amp;end=0' | jq .<\/em>\n{\n  \"entries\": &#91;\n    {\n      \"leaf_input\": \"AAAAAAFIyfaldAAAAAcDMIIG\/zCCBeegAwIBAgI...\",\n      \"extra_data\": \"AAiJAAS6MIIEtjCCA56gAwIBAgIQDHmpRLCMEZU...\"\n    }\n  ]\n}<\/code><\/pre>\n\n\n\n<p>Each record contains fields&nbsp;<code>leaf_input<\/code>and&nbsp;<code>extra_data<\/code>in base64 format.&nbsp;Referring to&nbsp;RFC6962,&nbsp;we see that&nbsp;<code>leaf_input<\/code>&#8211; the encoded structure&nbsp;MerkleTreeLeaf&nbsp;, and&nbsp;<code>extra_data<\/code>&#8211;&nbsp;PrecertChainEntry&nbsp;.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">About PreCerts<\/h2>\n\n\n\n<p>It took me quite a long time to figure out what PreCert is all about (you can try it yourself, read the&nbsp;RFC&nbsp;, and, apparently,&nbsp;I&#8217;m not the only one&nbsp;. I will save you a lot of time thinking and searching in Google and formulate the purpose of PreCerts as follows:<\/p>\n\n\n\n<p>PreCerts are a separate type of certificate issued by a CA before it issues a \u201creal\u201d certificate.&nbsp;In fact, this is a copy of the original certificate, but contains a special x509 v3 extension called&nbsp;<strong><em>poison<\/em><\/strong>&nbsp;and marked critical.&nbsp;Thus, the certificate will not be validated by platforms that recognize this extension and know that it is PreCert, or by platforms that do not recognize this extension.<\/p>\n\n\n\n<p>My experience in information security suggests that such a measure is not very effective, if only because bugs in x509 \/ ASN.1 parsing are quite common and individual implementations may be vulnerable to various shenanigans that will ultimately allow PreCert to be validated.&nbsp;I understand why this was done, but it seems that completely removing PreCerts and leaving only the certificates actually issued by the CA in the CTL would be much wiser.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Parse binary structures<\/h2>\n\n\n\n<p>As a person engaged in reverse engineering and from time to time participating in various CTFs, the task of parsing binary structures is not new to me.&nbsp;Most people refer to the&nbsp;<strong>struct<\/strong>&nbsp;module in such cases&nbsp;, but many years ago, while working for&nbsp;Phillip Martin&nbsp;, he introduced me to the excellent&nbsp;Construct&nbsp;library&nbsp;, which makes parsing such structures much easier.&nbsp;Below are the structures I used for parsing, as well as an example of their use to process records:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>from construct import \\\n    Struct, Byte, Int16ub, Int64ub, Enum, Bytes, Int24ub, this, \\\n    GreedyBytes, GreedyRange, Terminated, Embedded\n\nMerkleTreeHeader = Struct(\n    \"Version\"         \/ Byte,\n    \"MerkleLeafType\"  \/ Byte,\n    \"Timestamp\"       \/ Int64ub,\n    \"LogEntryType\"    \/ Enum(Int16ub, X509LogEntryType=0, PrecertLogEntryType=1),\n    \"Entry\"           \/ GreedyBytes\n)\n\nCertificate = Struct(\n    \"Length\" \/ Int24ub,\n    \"CertData\" \/ Bytes(this.Length)\n)\n\nCertificateChain = Struct(\n    \"ChainLength\" \/ Int24ub,\n    \"Chain\" \/ GreedyRange(Certificate),\n)\n\nPreCertEntry = Struct(\n    \"LeafCert\" \/ Certificate,\n    Embedded(CertificateChain),\n    Terminated\n)<\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>import json\nimport base64\n\nimport ctl_parser_structures\n\nfrom OpenSSL import crypto\n\nentry = json.loads(\"\"\"\n{\n  \"entries\": &#91;\n    {\n      \"leaf_input\": \"AAAAAAFIyfaldAAAAAcDMIIG\/zCCBeegAwIBAgIQ...\",\n      \"extra_data\": \"AAiJAAS6MIIEtjCCA56gAwIBAgIQDHmpRLCMEZUg...\"\n    }\n  ]\n}\n\"\"\")&#91;'entries']&#91;0]\n\nleaf_cert = ctl_parser_structures.MerkleTreeHeader.parse(\n    base64.b64decode(entry&#91;'leaf_input'])\n)\n\nprint(\"Leaf Timestamp: {}\".format(leaf_cert.Timestamp))\nprint(\"Entry Type: {}\".format(leaf_cert.LogEntryType))\n\nif leaf_cert.LogEntryType == \"X509LogEntryType\":\n    <em># \u0412 \u0441\u043b\u0443\u0447\u0430\u0435, \u0435\u0441\u043b\u0438 \u0437\u0430\u043f\u0438\u0441\u044c - \u043e\u0431\u044b\u0447\u043d\u044b\u0439 X509 \u0441\u0435\u0440\u0442\u0438\u0444\u0438\u043a\u0430\u0442<\/em>\n    cert_data_string = ctl_parser_structures.Certificate.parse(\n        leaf_cert.Entry).CertData\n    chain = &#91;\n        crypto.load_certificate(crypto.FILETYPE_ASN1, cert_data_string)\n    ]\n\n    <em># \u041f\u0430\u0440\u0441\u0438\u043c \u0441\u0442\u0440\u0443\u043a\u0442\u0443\u0440\u0443 `extra_data`<\/em>\n    <em># \u0447\u0442\u043e\u0431\u044b \u043f\u043e\u043b\u0443\u0447\u0438\u0442\u044c \u043e\u0441\u0442\u0430\u0432\u0448\u0443\u044e\u0441\u044f \u0447\u0430\u0441\u0442\u044c \u0446\u0435\u043f\u043e\u0447\u043a\u0438<\/em>\n    extra_data = ctl_parser_structures.CertificateChain.parse(\n        base64.b64decode(entry&#91;'extra_data'])\n    )\n    for cert in extra_data.Chain:\n        chain.append(\n            crypto.load_certificate(crypto.FILETYPE_ASN1, cert.CertData)\n        )\nelse:\n    <em>#  \u0412 \u0441\u043b\u0443\u0447\u0430\u0435, \u0435\u0441\u043b\u0438 \u0437\u0430\u043f\u0438\u0441\u044c - PreCert<\/em>\n    extra_data = ctl_parser_structures.PreCertEntry.parse(\n        base64.b64decode(entry&#91;'extra_data'])\n    )\n    chain = &#91;\n        crypto.load_certificate(\n            crypto.FILETYPE_ASN1, extra_data.LeafCert.CertData\n        )\n    ]\n\n    for cert in extra_data.Chain:\n        chain.append(\n            crypto.load_certificate(crypto.FILETYPE_ASN1, cert.CertData)\n        )<\/code><\/pre>\n\n\n\n<p>We get an array of X509 certificates from the chain with the certificate from&nbsp;<code>leaf_input<\/code>as the first element<\/p>\n\n\n\n<p>As you can see, Construct makes it pretty easy to define binary structures in Python.<\/p>\n\n\n\n<p>Now that we understand what CTL is and how to parse individual records, we can move on to the second part &#8211; getting and saving all records from the logs with the possibility of further searching by certificates.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Part 2. Retrieving, Storing and Querying 250M + Certificates Like a Boss<\/h2>\n\n\n\n<p><strong>Collecting certificates<\/strong><\/p>\n\n\n\n<p>According to the&nbsp;<a href=\"https:\/\/tools.ietf.org\/html\/rfc6962#section-4.6\">RFC<\/a>&nbsp;, an endpoint is used to retrieve log entries&nbsp;<code>get-entries<\/code>.&nbsp;Unfortunately, the task is complicated by the limitation on the maximum number of records that can be obtained in one request (controlled by the&nbsp;<code>start<\/code>and&nbsp;parameters&nbsp;<code>end<\/code>), and most logs allow only 64 records to be received at a time.&nbsp;However, Google&#8217;s CTLs, which make up the majority of all logs, use a maximum query size of 1024 entries.Translator&#8217;s comment<\/p>\n\n\n\n<p>Since the task is simultaneously&nbsp;<em>IO-bound<\/em>&nbsp;(receiving records via http) and&nbsp;<em>CPU-bound<\/em>&nbsp;(parsing certificates), for efficient processing it will be necessary to connect both asynchrony and multiprocessing.<\/p>\n\n\n\n<p>Since there were no tools that would allow you to easily and painlessly get and parse all CTLs (apart from the not particularly remarkable&nbsp;<a href=\"https:\/\/github.com\/google\/certificate-transparency\/tree\/master\/python\">utility from Google<\/a>&nbsp;, it was decided to spend a little time and write a tool that would meet all our needs. The result was&nbsp;<a href=\"https:\/\/github.com\/calidog\/axeman\">Axeman<\/a>&nbsp;, which uses&nbsp;<a href=\"https:\/\/docs.python.org\/3\/library\/asyncio.html\">asyncio<\/a>&nbsp;and the wonderful&nbsp;<a href=\"https:\/\/github.com\/dano\/aioprocessing\">aioprocessing<\/a>&nbsp;library&nbsp;for loading, parsing and saving certificates to multiple CSV files, limited only by the speed of the Internet connection.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Cloud exploitation<\/h2>\n\n\n\n<p>After receiving an instance (_approx.translated_ as VMs are called in Google Cloud) with 16 cores, 32GB of memory and a 750GB SSD (thanks to Google for the free $ 300 on the account for new accounts!), I launched Axeman, which downloaded all the certificates less than a day and saved the results in&nbsp;<code>\/tmp\/certificates\/$CTL_DOMAIN\/<\/code><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Where is all this data stored?<\/h2>\n\n\n\n<p>Initially, Postgres was chosen to store and search the data, but although I have no doubt that with the correct schema Postgres could easily handle 250 million records (unlike my first attempt, which took about 20 minutes!), I started looking for solutions that:<\/p>\n\n\n\n<ul><li>allow cheap storage of large amounts of data<\/li><li>provide quick search<\/li><li>allow easy data updates<\/li><\/ul>\n\n\n\n<p>There were several options, but in terms of cost, almost all of the options considered (AWS RDS, Heroku Postgres, Google Cloud SQL) were very expensive.&nbsp;Fortunately, since our data never changes in principle, we have additional flexibility in choosing a platform for data placement.<\/p>\n\n\n\n<p>In general, this is exactly the type of data search that fits perfectly with a map \/ reduce model using, for example, Spark or Hadoop Pig.&nbsp;Looking through the offers of various providers in the \u201cbig data\u201d category (although there is clearly not enough data in our task to be included in this category), I came across Google BigQuery, which meets all the indicated parameters.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Feeding BigQuery data<\/h2>\n\n\n\n<p>Loading data into BigQuery is pretty easy, thanks to Google&#8217;s&nbsp;<a href=\"https:\/\/cloud.google.com\/storage\/docs\/gsutil\">gsutil<\/a>&nbsp;utility&nbsp;.&nbsp;We create a new bucket for our certificates:<\/p>\n\n\n\n<p>When the bucket is ready, we use&nbsp;<strong>gsutil<\/strong>&nbsp;to transport all the certificates to the Google store (and then BigQuery).&nbsp;After setting up the account with the command&nbsp;<code>gsutil config<\/code>, we start the download process:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>gsutil -o GSUtil:parallel_composite_upload_threshold=150M \\\n       -m cp \\\n       \/tmp\/certificates\/* \\\n       gs:\/\/all-certificates<\/code><\/pre>\n\n\n\n<p>And we see the following result in our bucket:<\/p>\n\n\n\n<p>Next, we create a new dataset in BigQuery:<\/p>\n\n\n\n<p>Now we can import data from the repository into our new dataset.&nbsp;Unfortunately, BigQuery does not have a \u201cplease import all folders recursively\u201d button, so you have to import each CTL separately, but it doesn&#8217;t take that long.&nbsp;Create a table and import our first log (pay special attention to the marked settings):<\/p>\n\n\n\n<p>Since the scheme is needed every time the next log is imported, let&#8217;s use the \u201cEdit as Text\u201d option.&nbsp;The scheme used:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>&#91;\n    {\n        \"name\": \"url\",\n        \"type\": \"STRING\",\n        \"mode\": \"REQUIRED\"\n    },\n    {\n        \"mode\": \"REQUIRED\",\n        \"name\": \"cert_index\",\n        \"type\": \"INTEGER\"\n    },\n    {\n        \"mode\": \"REQUIRED\",\n        \"name\": \"chain_hash\",\n        \"type\": \"STRING\"\n    },\n    {\n        \"mode\": \"REQUIRED\",\n        \"name\": \"cert_der\",\n        \"type\": \"STRING\"\n    },\n    {\n        \"mode\": \"REQUIRED\",\n        \"name\": \"all_dns_names\",\n        \"type\": \"STRING\"\n    },\n    {\n        \"mode\": \"REQUIRED\",\n        \"name\": \"not_before\",\n        \"type\": \"FLOAT\"\n    },\n    {\n        \"mode\": \"REQUIRED\",\n        \"name\": \"not_after\",\n        \"type\": \"FLOAT\"\n    }\n]<\/code><\/pre>\n\n\n\n<p>Then we just repeat the process for each log.&nbsp;Make sure each import succeeds (errors can usually be ignored, just make sure you set adequate values \u200b\u200bfor the maximum number of errors).&nbsp;As a result, you should get something like the following dataset:<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What happened in the end<\/h2>\n\n\n\n<p>Now is the time to reap the benefits of our labors and test the system on various requests.<\/p>\n\n\n\n<p>Recently, there is a&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/IDN_homograph_attack\">lot of<\/a>&nbsp;talk about domains using&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Punycode\">punycode<\/a>&nbsp;and related&nbsp;<a href=\"https:\/\/en.wikipedia.org\/wiki\/IDN_homograph_attack\">homoglyph attacks<\/a>&nbsp;.&nbsp;Let&#8217;s try the following query:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT\n  all_dns_names\nFROM\n  &#91;ctl-lists:certificate_data.scan_data]\nWHERE\n  (REGEXP_MATCH(all_dns_names,r'\\b?xn\\-\\-'))\n  AND NOT all_dns_names CONTAINS 'cloudflare'<\/code><\/pre>\n\n\n\n<p>And in just 15 seconds we get the result with all punycode domains from all known CTLs!<\/p>\n\n\n\n<p>Let&#8217;s look at another example.&nbsp;Let&#8217;s try to get all Coinbase domain certificates recorded in Certificate Transparency:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT\n  all_dns_names\nFROM\n  &#91;ctl-lists:certificate_data.scan_data]\nWHERE\n  (REGEXP_MATCH(all_dns_names,r'.*\\.coinbase.com&#91;\\s$]?'))<\/code><\/pre>\n\n\n\n<p>In just two seconds, we get all the results we are interested in:<\/p>\n\n\n\n<p>The ability to easily perform this kind of analysis on such a large dataset is a powerful tool for identifying various trends that would otherwise not be possible to detect.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Small riddle<\/h2>\n\n\n\n<p>While doing my research, I discovered something strange.&nbsp;The domain&nbsp;<strong><em>flowers-to-the-world.com<\/em><\/strong>&nbsp;constantly appeared in various logs.&nbsp;Almost every log had a huge number of certificates containing this domain:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT\n  url,\n  COUNT(*) AS total_certs\nFROM\n  &#91;ctl-lists:certificate_data.scan_data]\nWHERE\n  (REGEXP_MATCH(all_dns_names,r'.*flowers-to-the-world.*'))\nGROUP BY\n  url\nORDER BY\n  total_certs DESC<\/code><\/pre>\n\n\n\n<p>Whois makes it possible to determine that this domain belongs to Google, so I&#8217;m wondering if this is part of some testing routine.&nbsp;If you are a Google engineer who can find out from fellow Certificate Transparency fellows, it would be very interesting to hear about it.Google engineer answer in the comments below the original post<\/p>\n\n\n\n<hr class=\"wp-block-separator\"\/>\n\n\n\n<p>Very soon we are planning to release our own product &#8211;&nbsp;<strong>NetLas.io&nbsp;.&nbsp;<\/strong>This is a kind of technical atlas of the entire Internet, which will include not only certificates, but also data on domains and subdomains, server responses on popular ports, and many other information useful for security researchers.<\/p>\n\n\n\n<p>In Russia, as far as we know, this is the first such product.&nbsp;We have strong competitors in the US and China, but we hope to outperform them in some ways.&nbsp;For example, the relevance of data &#8211; already now, our implementation of the search engine allows us to include data in the search results, from scans no older than a minute.&nbsp;Today&nbsp;<strong>Netlas.io<\/strong>&nbsp;is available in &#8220;early access&#8221; format.&nbsp;If you want to test &#8211; go to the site and register.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We&#8217;ve prepared a&nbsp;two-part&nbsp;translation of Ryan Sears&#8217;s article on handling&nbsp;Google&#8217;s Certificate Transparency&nbsp;logs&nbsp;.&nbsp;The first part gives an overview of the structure of the logs and provides a sample Python code for parsing records from these logs.&nbsp;The second part is devoted to obtaining all certificates from the available logs and setting up the Google BigQuery system for storing&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v16.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Google&#039;s Certificate Transparency As A Data Source For Attack Prevention | ValeurBit Infosec<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Google&#039;s Certificate Transparency As A Data Source For Attack Prevention | ValeurBit Infosec\" \/>\n<meta property=\"og:description\" content=\"We&#8217;ve prepared a&nbsp;two-part&nbsp;translation of Ryan Sears&#8217;s article on handling&nbsp;Google&#8217;s Certificate Transparency&nbsp;logs&nbsp;.&nbsp;The first part gives an overview of the structure of the logs and provides a sample Python code for parsing records from these logs.&nbsp;The second part is devoted to obtaining all certificates from the available logs and setting up the Google BigQuery system for storing...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/\" \/>\n<meta property=\"og:site_name\" content=\"ValeurBit Infosec\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/valeurbitinfo\/\" \/>\n<meta property=\"article:published_time\" content=\"2021-02-03T14:55:13+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2021-02-12T12:27:09+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@valeurbit\" \/>\n<meta name=\"twitter:site\" content=\"@valeurbit\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/valeurbit.com\/blog\/#organization\",\"name\":\"Valeurbit Infosec\",\"url\":\"https:\/\/valeurbit.com\/blog\/\",\"sameAs\":[\"https:\/\/www.facebook.com\/valeurbitinfo\/\",\"https:\/\/www.instagram.com\/valeurbit\",\"https:\/\/www.linkedin.com\/company\/valeurbit-infosec\/\",\"https:\/\/twitter.com\/valeurbit\"],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/valeurbit.com\/blog\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/valeurbit.com\/blog\/wp-content\/uploads\/2021\/02\/Valeurbit-new-logo-center.png\",\"contentUrl\":\"https:\/\/valeurbit.com\/blog\/wp-content\/uploads\/2021\/02\/Valeurbit-new-logo-center.png\",\"width\":1080,\"height\":512,\"caption\":\"Valeurbit Infosec\"},\"image\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/valeurbit.com\/blog\/#website\",\"url\":\"https:\/\/valeurbit.com\/blog\/\",\"name\":\"ValeurBit Infosec\",\"description\":\"Cyber Security Company\",\"publisher\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/valeurbit.com\/blog\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/#webpage\",\"url\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/\",\"name\":\"Google's Certificate Transparency As A Data Source For Attack Prevention | ValeurBit Infosec\",\"isPartOf\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/#website\"},\"datePublished\":\"2021-02-03T14:55:13+00:00\",\"dateModified\":\"2021-02-12T12:27:09+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/valeurbit.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Google&#8217;s Certificate Transparency As A Data Source For Attack Prevention\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/#webpage\"},\"author\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/#\/schema\/person\/df20c1cd317765fa8677a3056caeccfa\"},\"headline\":\"Google&#8217;s Certificate Transparency As A Data Source For Attack Prevention\",\"datePublished\":\"2021-02-03T14:55:13+00:00\",\"dateModified\":\"2021-02-12T12:27:09+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/googles-certificate-transparency-as-a-data-source-for-attack-prevention\/#webpage\"},\"wordCount\":1769,\"publisher\":{\"@id\":\"https:\/\/valeurbit.com\/blog\/#organization\"},\"articleSection\":[\"Valeurbit\"],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/valeurbit.com\/blog\/#\/schema\/person\/df20c1cd317765fa8677a3056caeccfa\",\"name\":\"ValeurBit\",\"sameAs\":[\"https:\/\/valeurbit.com\/blog\"],\"url\":\"https:\/\/valeurbit.com\/blog\/author\/valeurbit\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/posts\/19921"}],"collection":[{"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/comments?post=19921"}],"version-history":[{"count":0,"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/posts\/19921\/revisions"}],"wp:attachment":[{"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/media?parent=19921"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/categories?post=19921"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/valeurbit.com\/blog\/wp-json\/wp\/v2\/tags?post=19921"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}