Checking a sample of South African ID Numbers in Excel

Posted on March 11, 2015

All South African Identity numbers have built in validation digits. The formula to check these isn’t a secret. This afternoon I needed to check a list of ID numbers to ensure they were all valid and was most grateful to Robert MacLean (and Riaan Pretorius) for their work in putting together (and sharing) an excel sheet which can validate the numbers. Just copy and paste in your sample.

Find a copy of this useful spreadsheet here : ID Number checker

Cross posted on ITSec’s blog.


Analysing SCCM Logs with Log Parser Studio

Posted on June 21, 2014

Microsoft System Centre Configuration Manager (SCCM) is used in an Active Directory environment to amongst other things deliver patches and antivirus to the servers and workstations.

In a large environment you can have quite a number of SCCM servers operating in tandem to deliver the data to the client devices. Obtaining details in an SCCM environment of exactly which endpoints are being served by which servers (or how busy each server is) isn’t quite as straightforward as one might imagine.  The client devices all connect to the servers through HTTP calls to collect the packages they require. The IIS logs from the SCCM servers can be downloaded and analysed to try to figure out what is happening in the environment.

The IIS logs contain a number of useful fields (a sample included below) :

LogFilename : E:\logs\SCCMP101\SCCMP101\W3SVC1\u_ex140410.log
LogRow : 5
date : 41739
time : 36526.7730902778
c-ip : 10.75.xx.xx
cs-username :
s-sitename :
s-computername :
s-ip : 10.98.xx.xx
s-port : 80
cs-method : GET
cs-uri-stem : /SMS_MP/.sms_aut
cs-uri-query : MPLIST
sc-status : 200
sc-substatus : 0
sc-win32-status : 0
sc-bytes : 608
cs-bytes : 124
time-taken : 15
cs-version :
cs-host :
cs(User-Agent) : SMS+CCM

Further fields such as Cookie, Referer events and process types are present but I found these to be generally blank.  The above example includes the data transferred (sc-bytes and cs-bytes) which were not turned on by default and which I found quite useful. These can be activated in IIS easily enough.

In my use case I obtained the logs from 83 servers which amounted to 2859 files over 254 folders coming to 122GB (uncompressed). The proves to be a little bit of a challenge when I don’t have SQL server installed on my PC or Laptop, MS Access is limited to 2GB database and even SQL Express 2014 is limited to 10GB.

I had previously heard of (but not used) Microsoft Log Parser. A quick search revealed version 2.2 (released some 9 years ago in 2005 – but don’t let that put you off) available for download. Now this is a pretty nifty tool as it understands log files in many different formats (and even plain text files). This saves you from having to clean up the log files to strip out headers (which are written to the logs every time the web server starts / restarts) and from having to combine many different files. A real time-saver.

You can then write SQL-like queries and have them executed against your log files to get your results. Now with data of the size above on my 8 Gig i7 870 @ 2.93GHZ running off a Seagate 2TB 7200rpm SATA drive it takes around 3.5 hours to run a query (know how to speed this up fundamentally do share).  Using Task Manager to monitor the execution of the query shows CPU utilisation of only around 8% (one thread sits at about a 50% utilisation) memory utilisation of between 50MB upwards (as the result set grows) and disk speed varying from about 8MB/s to 30MB/s. So not quite sure where the bottleneck lies.

Writing the queries at the command line is a little bit of a pain so the next bit of genius is the Log Parser Studio.  Unlike Log Parser, the studio is kept up to date, with the latest build being from 23 May 2014. The studio provides a library of queries (over 170)  covering logs from ActiveSync, IIS, Windows event logs, Exchange, Outlook Web access amongst others. Covers a huge number of ideas for useful analysis and provides the queries to do it for you.  What is great is that you can use these, modify them for your own purposes or create your own from scratch, and add them all to your own personal library.

For example, to understand what the patterns of access look like over a month a query such as this can be pretty useful.

/* Server total time and data transferred by date */
SELECT s-ip,date,count (s-ip),sum(time-taken),sum(sc-bytes),sum(cs-bytes)
GROUP BY s-ip,date

A challenge that you quickly come across is that both the Log Parser and the Studio (which is dependent on the Parser) are 32 bit applications so you need to be careful as to which fields and how much summarisation is included in the result set. If the results grow to quickly then the query will crash as it runs out of memory. The trick is to find a balance between too much information (it crashes) and too little information (need to run many queries). Finding the right balance means further analysis can be done in Excel or Access. I have found that 30 000+ rows still works if few enough fields are chosen and some smart queries are used.

When executing the above query a problem is that the bytes transferred exceeds the value that can be stored in the data type, so you end up with negative numbers for some of the lines returned. Rewriting the query as follows assists in resolving this problem :

/* Server total time and data transferred by date */
SELECT s-ip,date,count (s-ip),sum(div(time-taken,60)),sum(div(sc-bytes,1024)),sum(div(cs-bytes,1024))
GROUP BY s-ip,date

Log parser supports a huge number of different functions but you need to know the format to use them. Take a look here for a listing and examples :

So whereas I would have expected sum(time-taken/60) to give me a result in minutes it fails with an unknown field. Even (sum((time-taken)/60) fails. Check the function reference shows that log parser wants it as sum(div(time-taken,60)) and then life is happy again. Running your query again having just spent 3 hours waiting for the last one to complete – a little less so.

Using these tools and queries I was then able to summarise down the 160 gig of source data into a few thousand rows that could be imported into Access and joined to CMDB data to produce really useful intelligence that can be analysed directly in Access, Excel or Gephi. Thanks Microsoft for these great free products.

I was also looking for further pre-built queries for Log Parser Studio but was unable to find any. If you know where such may be found, please do share. If there is interest and as I use the tool more I will be happy to share anything I have created.









Visualising Security Data : SCCM patching traffic flows

Posted on March 03, 2014

I have been experimenting a little recently with visualisation of security data.

We have had some challenges with SCCM and needing to understand which clients were connecting to which servers, where and why. This data seemed very hard to come by and after some discussions with some helpful Microsoft South Africa folk the service provider pulled the IIS logs from most of our SCCM Primary Servers and the Distribution Points.

I then added in a Destination column (being the server from which the log was pulled) and combined the logs from all of the servers (6 Primary and 6 Distribution). In MS Access I then summarised the data by source and destination pairs, providing 13952 connections. This was exported as a CSV and headings added in using Notepad (Gephi wouldn’t read the data file without headings named to its liking).

The data was then loaded into Gephi as edge data. I then searched for each of the 12 servers in the node table, added in a Label, changed the colour and size (Red 30 for Primary Blue 20 for secondary), selected the Force Atlas option and let it plot my data. The 13 546 nodes and 113952 were then plotted providing the graph below (when exported as PDF).

The graph was somewhat unexpected in that I did not foresee so many of the workstations being served from Primary servers nor so many devices receiving data from multiple servers. A few of the DP’s (top and bottom of screen) clearly are not serving the numbers of workstations we would expect and need deeper investigation.

While Excel cross tabs and more detailed access queries provide more detailed insight into what is going on this visualisation very quickly demonstrates a very different picture to which the service provider running the SCCM infrastructure had been describing.

Have you done anything similar? Please do share.


Map of SCCM links using Gephi

Map of SCCM links using Gephi


Download the PDF version here :  map of sccm v2


Optimising Audit Command Language (ACL) Coding

Posted on July 06, 2010

I was working with a colleague the other day on trying to debug some Audit Command Language (ACL) code that was doing an interest recalculation for a moderate sized dataset of some 1.7 million records. The recalculation was taking some time to perform, running at around 70 minutes for the calculation. We were getting significant errors in our results and were quite frustrated. We had taken a small sample of records (12 months for two accounts) and had done some “manual” recalculations in Excel so we could see our code was producing the same results on the test check, but we were getting much bigger differences on the full record set.

Debugging was just getting really frustrating given that it was taking over an hour after each change to see whether the difference was sufficiently reduced to give a number that wasn’t material.

After working on the code for a morning, I decided that there had to be a better way to do this, so we started doing some rough profiling of the ACL code. Looking through it, there were a number of duplicated calculations and a number of duplicated recordoffset reads. We spent some time assigning these to calculated variables to reduce the numbers of reads and recalculations. It didn’t make much difference, shaving a few minutes off the overall time.

We looked at our hardware, a Dell E6400 laptop with a high speed SSD connected via eSata. We tried running the data calculations off the lower speed internal drive, and found that it was running at the same speed, which was somewhat worrying. At this point we also realised that a simple summing of the data was taking less than 15 seconds to read all the records and produce a total, so something (other than the data reads) was causing the huge slowdown.

We started hacking our code to pieces (ignoring the interest calculation part for now) and discovered that the calculations weren’t causing much of the overhead, the only thing that reduced the times was removing the recordoffset commands. Removing these commands reduced the times by a factor of 100+.

Now we realised that the code would have to be rewritten, almost from scratch. It took about an hour to recode the calculations into Group by statements, and boy what a difference. The first run through of the new code took a mere 26 seconds. Sure the calculations were way out, but debugging code that runs so much quicker is a hang of a lot easier.

A couple of hours later and the code was now correctly calculating the interest to within materiality.

What we learned :  The recordoffset in ACL seems to be terribly slow and should be avoided for anything but the smallest datasets. Rather invest the time and write the code using the Group by commands. The speed increases are more than significant and in the longer run will reward you handsomely.

The other little catch, don’t forget that it’s not only numbers that get type cast in your initial assignment, it’s also strings. So, using  : Assign previousinterestrate=0.00000 ensures that your interest rate will use five decimals rather than chop it to an integer if you used  assign previousinterestrate=0

Assign previousaccountnumber=”12345678″ gives you an 8 digit account number whereas previousaccount=”xx” will chop it to just two even if you later assign it to fields containing 8 digit account numbers. That one had us stumped for about an hour where the interest rates and previous transaction dates weren’t resetting when the account numbers changed. After fixing our initial assignment (as above), everything worked like magic.

Hope that helps some avid ACL coders out there to do things a little more efficiently.

The work above was performed with ACL Desktop Edition Version 9.1 found here on the website.

%d bloggers like this: