Tower Interview Question
Please complete questions 1, 2 and 3 below. For the Python problems, please use Python 2.7 and feel free to make use of any basic linear algebra routines available in NumPy, but please do not use any libraries that may solve any other non-trivial parts of the problems. For example, if any aspect of the problem requires you to compute a linear regression, then please do so from first principles rather than using the functionality built in in SciPy. 1) Write a C++ program which reads PITCH data from standard input and, at the end of the input, prints a table of the top ten symbols by executed volume to standard output. For example, your table should look something like this: SPY 24486275 QQQQ 15996041 XLF 10947444 IWM 9362518 MSFT 8499146 DUG 8220682 C 6756932 F 6679883 EDS 6673983 QID 6526201 The PITCH specification is available from the BATS website here. In short, you'll need to read Add Order messages and remember what orders are open so you can apply Order Cancel and Order Executed messages. Trade Messages are sent for orders which were hidden. You'll need to use both Order Executed and Trade Messages to compute total volume. For simplicity, ignore any Trade Break, long messages, Auction Fill and Routed Trade (‘B’, ‘r’, ‘d’, ‘C’, ‘R’). We've included a portion of live PITCH data in a file named pitch_example_data. (Note that each line in the sample file begins with an extra character, S, not mentioned in the specification. That can be ignored.) Along with your source code, you should provide a makefile with the target pitch_parser. Your compiled binary should be invoked as follows: cat pitch_example_data | ./pitch_parser 2) Given a feature variable, x, and a corresponding response, y, write a python program called local_linear.py to estimate the regression function f in the model y = f(x) + e using a local linear kernel estimator and k-fold cross-validation to select the bandwidth. For simplicity, you may assume that a Gaussian kernel should be used. You should also provide functions for predicting new values and plotting the estimated regression function. The required inputs to your program would be 2 files with n rows and 1 column of numeric data – one containing the x variable and the other containing the y variable – an output file path, and the number of folds. So at it's very basic level, the program could be invoked as follows python local_linear.py --x xin –-y yin –-output output –-num_folds 10 And in output, you would write a n x 1 column of numeric data containing the values of the fitted function evaluated at the points found in xin. Your program should also support the optional arguments –-plot, which would display a scatter plot of y against x with the fitted function drawn over it, as well as –-xout, to which the path to a further n x 1 file of numeric data would be provided so that output would contains the values of the fitted function evaluated at these points rather than the training points found in xin. Example data sets xin and yin are provided. 3) Let X be a n x p matrix of features and y be a n x 1 response vector. Suppose you are given X'X and X'y rather than the raw data. Write a python program to implement forward stepwise linear regression using the data you are given. Forward stepwise linear regression adds features iteratively so that feature added in each iteration reduces most residual sum of square. You may assume that the user has a predefined stopping point, k, for the number of features they want in their model. Your algorithm should be optimised for computational complexity and should deal with any potential numerical issues arising from (near) colinearity. Your program should be invoked as follows: python stepwise.py –-xx xx –-xy xy –-max_iterations 20 –-output output Where xx contains a p x p matrix of X'X and xy contains the p x 1 vector X'y. Output should contain a k x p csv where each row contains the fitted coefficients on each iteration of the forward stepwise search. For example, row k should contain the fitted coefficients after entering k features into the model. You should also provide a brief document explaining how your algorithm works. Example data is again provided.
Unlock the Full Solution
Enter your email to get instant access to the complete solution, explanation, and similar practice problems.