(By: Gil Raviv. Originally published here)
In this walkthrough we will show step-by-step how to turn web traffic logs (W3C Extended Log File Format) into a powerful security analytics workbook. To watch other episodes of Excel for Security Analysts go here.
Web traffic logs are generated by web proxies and application firewalls to track inbound and outbound traffic and report detection of viruses or unapproved content.
Using Excel 2016, we will import the file, shape it with the Query Editor which is based on Power Query technology, load it to the Data Model and get security insights with PivotTable and DAX formula.
Ready to go?
Download the Zip file from here, extract the file Demo_log_004.log and save it to your computer.
- Start Excel 2016 (If you didn’t try it yet, you can download it from here, or use Excel 2013 with Power Query Add-in).
- On the Data tab, in Get & Transform group, click New Query, click From File, and then click From Text.
Note: If you use Excel 2013, on the POWER QUERY tab, click From File, and then click From Text.
- The Import Data dialog will open. Go to the dropdown menu Text Files (*.txt) and select All Files (*.*), browse to your saved log file Demo_log_004.log folder, select that log file, and click Import.
The Query Editor will open. Here is where you can see a preview of the data, shape it and then load it to the workbook. This technology allows you to work with big data, define transformations on a preview of the data, and then load the transformed data to your workbook.
You can see in our scenario that the Query Editor shows all the logs on a single column. On the fourth row you can see the header line that defines the fields of the logs (#Fields: date time …).
Let’s filter out the lines that do not contain log data or “#Fields:”
- Click the filter button on the right side of Column1, then click Text Filters, and click Does Not Begin With…
- The Filter Rows dialog will open. Set it to reflect the following logic:
does not begin with #
begins with #Fields
- Right click on the header of Column1 and click Replace Values…
- The Replace Values dialog will open. Set the Values To Find edit box to “#Fields:”, keep the Replace With edit box empty, and click OK. This action will keep header lines with space-separated field names.
- Right click on the header of Column1, Select Split Values… and click By Delimiter…
- The Split by Delimiter dialog will open. Select Space under Select or enter delimiter, then select At each occurrence of the delimiter, and click OK.
We now have a column for each field in the log, but the first row should actually be served as headers.
- Click the table icon on the top left corner of the grid (Highlighted in red rectangle below), and click Use First Row As Headers.
Now let’s filter out all the header lines that starts with “date”.
- Click the filter button on the right side of the date column header, then click Text Filters, and click Does Not Begin With…
- In the Filter Rows, type “date” in the edit box which is next to does not begin with, and click OK.
We can now change the types for several columns:
- The type of “date” column should be set to Date.
- The type of “time” column should be set to Time.
- The type of “time-taken” column should be set to Whole Number.
- The type of “cs-bytes” column should be set to Whole Number.
- The type of “sc-bytes” column should be set to Whole Number.
Here is how you can change column types with the Query Editor:
- Select the column by clicking its header. On Home tab, click Data Type menu, and click the new type according to the data types above.
- You can also change the type by right clicking the header and change the type in the context menu.
That’s it. We have finished the query editing, and built the necessary transformation that will fit a table in the Data Model.
- In Home tab, click Close & Load To…
- The Load To dialog will open. Click Only Create Connection, then check Add this data to Data Model, and click Load.
Now Power Query will process the log file, transform it according to the steps you have defined in the Query Editor, and will load the data into the Data Model.
It’s time for some data analytics:
- On the Insert tab, in the Tables group, click PivotTable.
- The Create PivotTable dialog will open. Select Use this workbook’s Data Model, and click OK. Note: Use this workbook’s Data Model radio button is introduced in Excel 2016. On Excel 2013 you should select Use an external data source, and click Choose Connection…, then on Existing Connection dialog, click the Tables tab, click Tables in Workbook Data Model and then click Open.
You can now start creating a variety of PivotTables to detect viruses, outliers and suspicious web hosts.
Check out the Search box in PivotTable Fields pane. This is a new functionality in Excel 2016. It was also available with the PowerPivot Add-in for Excel 2010, but didn’t make it to Excel 2013. Now it’s back.
- In the Search box type “host”.
- Drag and drop the field “cs-host” to the ROWS box.
- You can immediately in the PivotTable the list of all the Web hosts in the log file. Drag and drop the “cs-host” to the Values box, and you can see how many web transactions were made for each host.
But let’s see if we can create a new measure that divide the outgoing traffic by the incoming traffic per host.
For this purpose, we will share with you a great feature in Excel 2016 – The ability to add measures from the PivotTable field well.
- In the PivotTable Fields pane, right click on Demo_log_004 table in PivotTable Fields pane, and click Add Measures…
- The Measure dialog will open.
- In Measure Name box type “Upload/Download”
- Inside the formula bar type:
= SUM([cs-bytes]) / SUM([sc-bytes])
You will notice that Excel provides you a variety of options to choose from as you type in your DAX formula.
- Click OK
- Scroll down to the bottom of the fields, and drag the measure “Upload/Download” to Values box. You can also just search for the measure in the Search box.
Let’s sort the PivotTable by the values in descending order, and see which hosts “take more and give less”.
Oh, this is a suspicious AD Server, and look at the extremely high rate of Upload/Download.
We may consider blocking outgoing traffic to that server 🙂
To learn more on the possible analytics you can build, and learn some complex queries that can parse this log format, download this workbook.
Return here to follow the next episodes. Coming next – handling multiline SSH logs…