C# and CSV's

Hey all!  

Got an assignment recently for an interview to a company handling investment and share selling data.  They actually gave me a well thought out problem that they have to solve all the time prior to the interview.  The requirements were to take a CSV file of Market Transactions and generate reports based on the data.

The Assignment

Now, I only had 2 days to work on the submission, and I was working full-time during those days.  I also probably haven't updated the project since then, so just a little warning there.  Below are the requirements.  I probably only hit about 60% of the bullets.

Summary: Write code to consume data in a CSV file, and generate reports.

You may use any language, libraries or tools you want, as long as you will be
able to demonstrate your solution in person. Code that you can email to the
interviewers ahead of time usually works best, but other means of
demonstration, such as a laptop with the necessary tools loaded, would be fine
as well.

Input Specification: CSV file. First line is a header. Columns are:
    TXN_DATE - date transaction took place
    TXN_TYPE - type of transaction (BUY/SELL)
    TXN_SHARES - number of shares affected by transaction
    TXN_PRICE - price per share
    FUND - name of fund in which shares are transacted
    INVESTOR - name of the owner of the shares being transacted
    SALES_REP - name of the sales rep advising the investor

Output Specification:

    1. Provide a Sales Summary:
        For each Sales Rep, generate Year to Date, Month to Date, Quarter to
        Date, and Inception to Date summary of cash amounts sold across all
        funds.

    2. Provide an Assets Under Management Summary:
        For each Sales Rep, generate a summary of the net amount held by
        investors across all funds.

    3. Break Report:
        Assuming the information in the data provided is complete and accurate,
        generate a report that shows any errors (negative cash balances,
        negative share balance) by investor.

    4. Investor Profit:
        For each Investor and Fund, return net profit or loss on investment.

Be prepared to discuss:

    Testing strategies
    Difficulties you had and how you solved them

Suggestion:

    Your code should be correct first, have good style and comments second, and
    then be clever/fast as a distant third.

The Code

I chose to write the project with a .NET Core back-end to consume the CSV Data and an Angular front-end to display the reports.  The .NET Core server code uses the CSV Helper library to consume and generate strong-typed entities from the CSV file.  I do this here in the code: https://github.com/mitchgollub/DotNetCore.Angular.SalesCSVReader/blob/05c78f9f3dd7b130e9ad566dc387b4ff7bf762ec/Repositories/TransactionRepository.cs#L24-L34

CSV Helper is written and maintained by Josh Close and is taking PR's on GitHub.  More info on CSV Helper here: https://joshclose.github.io/CsvHelper/.

Writing the reports in Angular helped me easily grab aggregated data from the back-end server and display it in tables on different pages.  I could define a ViewModel for each report with all of the completed columns and have the server send over the rows.   That happens here: https://github.com/mitchgollub/DotNetCore.Angular.SalesCSVReader/blob/05c78f9f3dd7b130e9ad566dc387b4ff7bf762ec/ClientApp/src/app/sales-rep-summary/sales-rep-summary.component.ts#L11-L23


Things I would have done differently

So, given that the CSV was rather short, I had not run into any performance issues running the solution as it was.  However, the architecture would easily crawl to a halt when aggregating a large dataset.  There are two changes I could make to allow the application to scale: Reduce iterations over the dataset and minimize the amount of times the code reads the CSV file.

One example of the iterations I did here shows four aggregates that pass over the data multiple times.  

I can reduce the number of passes through the data by leveraging a LINQ Aggregate.  The Aggregate calculation will be heavy in that it would calculate the total monetary value from the transaction using the CalculateTransaction function, then check each transaction for a date matching date range to be added to the totals for Year-to-Date, Month-to-Date, etc. However, with a significantly large dataset, the performance benefit of passing the data only once will outweigh the extra computation.  More on the Aggregate function here: https://docs.microsoft.com/en-us/dotnet/api/system.linq.enumerable.aggregate?view=netcore-2.1

Another point I would improve would be the code reuse on the Angular app.  Each component is a copy and paste job from the original .NET Core/Angular template project from Microsoft.  Given more time, I'd break out the HttpClient calls to a ReportDataService that could be injected to each component.  Then there could be a function on the ReportDataService that calls a wrapper function around the reused HttpClient.get() commands.  I could even merge the .NET Core endpoints into one Controller, return all the report aggregate data in one ViewModel, and just make one HttpClient call.  Merging the endpoints would reduce code duplication in the .NET Core app as well.

The final major update that could be implemented would be some graceful failing in the event that the CSV structure changes.  As the app stands at the time of writing this, it is semi-resilient to CSV changes.  If new columns are added, they will simply not be picked up by the code.  However, if the CSV file moves, the existing column names change, or existing columns are removed, the server will throw a 500 error when trying to return a report.  

Implementing this resilience might be a little more involved.  I'd imagine it'd be good to store a backup CSV file from the last successful run of the application.  That way if the file has an error during data retrieval, we can fall back on the backup file.  This will hide the issue, so proper logging and notifications should be put in place to alert the party involved with the maintenance of the CSV file can make adjustments.  If the business need requires showing that the file is ill-formatted, we should bubble that message up to the UI (in a friendly way) so the user is aware and can handle the situation accordingly.

In short, working with CSV files in .NET Core isn't so bad.  CSVHelper made reading the file incredibly easy.  I'd still prefer a database to store the data, but for some custom integrations, CSV's are a very real part of people's workflows.  The project itself was impossible to complete in a short amount of time, but I'm happy with what I was able to make.  One day maybe I'll go back and it round out with the items I listed that I would change.