How to prevent Phishing in 2020 - with some Python, Azure and Splunk

Start

Lately I have been experimenting a lot with different code snippets and scripts that can be found on Github. Especially in the OSINT field, Github is an important source of inspiration and valuable tools.

In the following I would like to introduce some of them and show how they can add value in a business context.

Especially in enterprises it is important to utilize existing infrastructure, therefore I would like to show the integration with Azure and Splunk - of course only as representatives for comparable setups.

The Python

DnsTwist

I want to start with a script, which should not be missing in any toolbox. DNSTWIST can help you in your daily job preventing users from being fooled into any sort of phishing. DNSTWIST identifies lookalike and typosquatting domains, and as a bonus, also do a whois lookup for the registration date or verify if a html page responds to a request.

To start, however, we only need to generate a CSV or JSON. I will use this CSV in combination with Phishing Catcher, but it is also possible to import the file as a lookup into Splunk for basic alerting, or use it as a blocklist in your proxy / e-mail gateway.

dnstwist --format csv yourdomain.com > dnstwist.csv

Phishing Catcher

My next pick is a fork of Phishing Catcher. Phishing Catchers original idea was to catch possible phishing domains in near real time by looking for suspicious TLS certificate issuances reported to the Certificate Transparency Log (CTL) via the CertStream API. Over the years the Github community extended the purpose - for example is the user Stoerchel adding value by using the scoring function against newly registered domains.

Both projects seems to be no longer maintained. I tried to merge all useful additions and pull requests into a new fork. Now it is possible not only to check for suspicious phishing domains via the CertStream API, also newly registered domains are checked, using open whoisds.com data. As an addition the dnstwist domains can be included. I think it is helpful to rise the scoring if one of these domains show up, especially in a business context.As a bonus a Splunk readable log is created (key/value).

It is still a work in progress, with room for improvement.

Twitter IoCs

The idea behind the last script arises of the need of speed for url blocklists. Most threat intelligence feeds are some sort of behind, in an field where every second counts, especially in the battle against Emotet or TA505.

To increase the speed you need to get the data from the source - Twitter.

A first prototype was created using one of the anonymous Twitter parser from Github. However, it quickly becomes apparent that without using the official API, no reliable service is possible. After successful application for API access at developer.twitter.com, Tweepy is the tool of choice.

import os
import tweepy as tw

consumer_key= 'xxxx'
consumer_secret= 'xxxx'
#access_token= 'yourkeyhere'
#access_token_secret= 'yourkeyhere'

auth = tw.OAuthHandler(consumer_key, consumer_secret)
#Only for Access with User Context
#auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

# Define the search term and the date_since date as variables - max last 7 days
# Twitter Operator e.g -filter:retweets
# https://developer.twitter.com/en/docs/twitter-api/v1/rules-and-filtering/overview/standard-operators
search_words = "#TA505 -filter:retweets"
date_since = "2020-09-20"

# Collect tweets
tweets = tw.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since,tweet_mode="extended").items(5)

# Iterate and print tweets
for tweet in tweets:
    print (tweet.created_at, tweet.text, tweet._json['retweeted_status']['full_text'], tweet.user.name, tweet.id)

A few remarks. If it is not necessary to interact with Twitter (Tweet, Retweet, Like) the consumer_key and consumer_secret is sufficient. The next insight concerns so-called operators with which the result space can be restricted. With -filter:retweets for example the reteweets are excluded. Last but not least the advice to use the tweet_mode=extended, otherwise the tweets are truncate.

To extract the indicators of compromise, the python module python-iocextract is used, and a Splunk readable log is created (key/value).

Another useful addition could be a quick check against Threat Intelligence APIs whether an entry already exists.

An alternative is to use the feeds from www.twitterioc.com. Currently, the data is updated only once a day, so the time factor mentioned at the beginning of this article is a problem here as well. According to the owner that is to change in the future. The feed could then be integrated directly into Splunk (for example via the Threat Intelligence of Enterprise Security).

Azure Functions

The next challenge now is how and where the scripts can be executed. Best with as little resource consumption as possible and with a simple cronjob. One - relatively new option - are so-called Azure Functions.

Azure Functions allows you to run small pieces of code (called “functions”) without worrying about application infrastructure. With Azure Functions, the cloud infrastructure provides all the up-to-date servers you need to keep your application running at scale.

This allows serverless execution of Python scripts. For Azure Functions so-called triggers are necessary to start the execution. This can be a HTTP request or, what we need, a time trigger.

A function is “triggered” by a specific type of event. Supported triggers include responding to changes in data, responding to messages, running on a schedule, or as the result of an HTTP request.

Details about the implementation can be found in the docs of Microsoft also there are repositories on Github with a lot of examples.

Splunk

To close the cycle, the last step is the integration into the SIEM (here Splunk).

There are many ways to send data to Splunk. The easiest option is to write the data via the HTTP Event Collector (HEC):

The HTTP Event Collector (HEC) is a fast and efficient way to send data to Splunk Enterprise and Splunk Cloud. Notably, HEC enables you to send data over HTTP (or HTTPS) directly to Splunk Enterprise or Splunk Cloud from your application. HEC was created with application developers in mind, so that all it takes is a few lines of code added to an app for the app to send data. Also, HEC is token-based, so you never need to hard-code your Splunk Enterprise or Splunk Cloud credentials in your app or supporting files. HTTP Event Collector provides a new way for developers to send application logging and metrics directly to Splunk Cloud and Splunk Enterprise via HTTP in a highly efficient and secure manner.

#send the data via HEC to Splunk
def send_to_splunk(data):
    #https://medium.com/adarma-tech-blog/splunk-http-event-collectors-explained-2c22e87ab8d2
    url='https://yoursplunk.server:8088/services/collector'
    authHeader = {'Authorization': 'Splunk your-api-token-from-splunk'}

    for event in data:
        jsonData = json.dumps(event,separators=(',', ':'))
        post_data = {
        "sourcetype": "json",
        "event": jsonData
        }
        response = requests.post(url, headers=authHeader, json=post_data, verify=false)
        print ("Data sent to Splunk HEC" + response.text)

As an alternative it is possible to write a log, which can be processed directly without major adjustments (key/value). Details are explained in the Splunk logging best practices.

One of the most powerful features of the Splunk platform is its ability to extract fields from events when you search, creating structure out of unstructured data. To make sure field extraction works as intended, use the following string syntax (using spaces and commas is fine). key1=value1, key2=value2, …

One more note when using Azure Functions with Blob Storage. The Splunk Azure Cloud add-on can read the content of the blob storage via API, this saves an additional firewall rule when already in use.

Conclusion and next steps

It is always exciting to see what tools and scripts the IT security community has to offer. Really very skilled people make their work available to everyone for utilization and education. This is very inspiring in many cases, as it always provides incentives and ideas for new creative analyses, alerts and use cases.

Nevertheless, the expenses involved should always be taken into account. Not only continuous maintenance, but also further development. Whereby in many cases this is the most exciting challenge.