This summer I got the chance to work with BiG Consultancy whilst finishing my Masters course in Computational Finance at Exeter University. I was interning at Crowdcube, an equity crowd-funding platform, who use BiG team as their Data experts. I first met the team at BiG in my third week. I ended up enjoying it down there so much that I worked from their office for the remainder of my internship!
My task was to help implement a data driven process to find companies suitable for crowd funding.
BiG were on the track to achieve this through machine learning, with a data set that was deep and rich enough to facilitate this. Although, in order to deploy the more in my view glamorous machine learning algorithms to achieve speed efficiencies, there was a necessary grunt work involved of building our own clean and complete dataset. I was shown how to call APIs in Java, a programming language I had never used before, and write the results to a database.
The project began with warehousing the data in Amazon Redshift. However, due to the large volumes of data we were collecting we progressed over to a cloud based system, Cassandra – now that’s Big Data! Random Forest was the algorithm of choice – a decision tree based algorithm. A training set of pre-classified companies was used to train the model, with various rules being applied to evaluate the larger dataset. By the time I had finished my internship, we had created an environment capable of producing hundreds of leads a week, with the model becoming more and more intelligent as feedback of the lead quality was fed back in.
My time at BiG definitely complemented my academic work where I was focusing on applying Monte Carlo methods to price path-dependent options. This is a process where the price of an asset is simulated thousands of times to obtain the average price of the option – a technique which is useful when there is no analytical solution available.
Both projects had a heavy emphasis on data and programming with methods and theories applied feeding into each other. For example, I found the role of random number generators very interesting. In particular, how they could be replaced with low-discrepancy sequences in order to reduce variance, whilst still keeping the “random” properties needed for the Monte Carlo method to work. This meant that the option price would converge to the correct solution quicker than the standard approach.
Even simple rules of programming that I was taught at BiG such as how to structure code and export results to a database helped me when it came to writing up my report and visualising the results.
Big Data is an extremely fascinating topic and I would advise anyone with a passion for numbers to read into it. Thanks to the team at BiG for teaching me so much, it was great having the chance to work with you!