Business Analyst Skills: Hard Technical Skills You Need To Have
What skills a business analyst needs to have? In this video below, you could find a good explanation about hard technical business analyst skill for modern research.
You can find what a business analyst do, and what kind of skills a business analyst need to have. You could watch the story below.
Read the video transcript below.
My name is Alex Petrelli I work in business analytics at an algorithmic trading firm in New York City and today I’ll be talking about hard technical skills for modern research and business analysts.
I remember when about three years ago I was graduating from the university, actually McGill University here in Montreal. and I had SATA economics in finance. I was going to be a consultant an analyst at a consulting firm in Boston.
So I went through our formal training program where I learned the analyst weapon of choice.
This is my Microsoft Excel. And I got good at Microsoft Excel. I learned vlookups, index matches, pivot tables, I memorized and still have memorized all of the keyboard shortcuts you can imagine.
And I was convinced at the time that if I use my understanding of economics and my understanding of Finance and use this powerful tool Excel, I would be able to create these insightful and impactful analyses that would impress my managers and clients.
I was convinced of this. I was also naive.
So this is what being else actually looks like, at the bottom, we have so many tabs open that we can’t tell which data transformation lets the next one. In the middle, we have a file that’s just too big, and Excel refuses to open it. In the top on both sides we have multiple Excel workbooks, and as soon as you move one workbook to another location, all the references break.
For me, this came to a head when I was tasked with pulling data from a website and bring into Excel. And I had to go through every page on this website and copy and paste all the data into Excel and format it.
And this was not like some set of hours; this was literally days of copying and pasting. If programmers can imagine that, this is what analysts do. This is actually what they do.
So what I learned, I had to do because I vowed I would never do this process again, it was so awful and miserable and thankless that what I need to do is I need to learn how to automate this stuff. I needed to learn how to program.
So I do what everyone does when they first learned something new, they google it. And this is what I saw. There is just too much software, too much terminology, too many concepts to figure out where to begin.
So what I decided to do is I would assign myself a project. I would assign myself a very simple analytical project for an analyst to learn the bare minimum set of tools I needed to get the job done.
I didn’t want to deal with anything else. So I’d go to this website, this is the IgM economic experts panel. It’s a panel of surveys that sent out to prominent economists in the US. And they asked them survey questions like; do you think interest rates will rise or fall in the next quarter? How confident are you? Ten most common confident? One least confident?
So here’s a very basic analytical question;
What’s the average level of confidence given by all the economists on this survey?
So I would simply go into each survey. In each survey, I find a data table like this, and I’ll just take the average of these numbers.
Not so hard. Of course what I actually had to do was a head learn Python just begin with. That would give me my general programming language to actually automate anything. But Python alone is insufficient, so I had learned requests on top of this, which would allow me to send these get requests out to websites. And that would have that would yield me my HTML. But even when I had this HTML is still insufficient, because it’s just a blob of text so I had to parse it using beautifulsoup. I would parse it for the date I wanted and ignore everything else.
Now I think what’s common to a lot of people, a lot of programmers when they learn sign new if they do way too much of it. So I extracted all the data I could, I scraped all the data I could, I open the CSV file in Excel, and of course what happens.Excel refused to open it.
So this kicked off the next step of my journey. I had to learn sequel, and I had to learn databases to actually store this information so that I could access it later.
and I had to learn sequel, not an analytical sense, how most analysts are familiar with it so your selects your group buys, your wares.. but I actually had to do it in a data engineering sense.
So this was your data models, you create statements, or insert statements, you truncate. And only then was I able to have a dataset, with which I could perform my analysis.
I had to do all this first before I even do any analysis.
So now I got the analysis, select the average confidence from the
economists. Of course it’s so simple.
This is what I see.
So what did I forget to do?
I forgot to check my data for null values, and had I done that I’d have seen that null values are coded as negative 99. Those gut averaged in there.
So what more importantly what I forgot to do exploratory data analysis or EDA. EDA is about visually just like we have here, visually looking at your data and understanding it, and familiarizing
yourself with it.
So what types of values do I get in a column? What are the columns labeled? What do those columns mean? What are the data types? You’ll probably have to check a lot of documentation for this or code books or data dictionaries. And only after you’ve done this process, can you perform data cleaning.
You have to do this first before you can actually remediate those errors in your data. This step I want to emphasize because it’s so important, the fact is at the end of the day nobody will look at your data like you will as the analyst. Your managers won’t look at it, and your clients won’t look at it, your customers won’t look at it. Nobody will look at this data except you. So it’s very very important that you know it very well.
And now we’ve finished the engineering pipeline. We’ve pulled our,data we stored our data, we’ve understood it, and now we have a clean data set with which we can work.
Every analysis doesn’t start with data, and it doesn’t start with statistics, it starts with a question. And this question is a very iterative process.
So you will ask a question, and you will learn from your data iteratively. You might think that sales managers drive off a revenue, which sales managers drive all of a revenue. But you’ll find only with a give and take with your data that it’s not actually sales managers at all, it’s which sales territories are important. So along the way, as you’re iterating on this, you’ll be data wrangling. And this is not the secret we had before, but this is a more analytical sequel. So this is your selects your group buys, your joins, your window functions, if you’re fancy, you can also use pandas and are here or MATLAB.
And then we get to what we’re paid for. this is it: data analysis is. Our namesake.
What I think a lot of people think about analysis is that is heavily quantitative. It’s all residual plots and t-tests and regressions machine learning algorithms, and it can be all of that.
But often it’s just this. It’s summary statistics. It’s basic, it’s the point, and what’s useful about these is they’re directional. So they tell me where to look at my data, what’s signal versus what’s noise, what should I pay attention to, what are the patterns I should look for, and what’s everything else I should ignore, what else should I ignore.
Now did anyone actually enjoy looking at this though, despite how useful, I say it is, this is awful to look at. everyone can agree. What people actually want to see is something like this. this has the same information, it actually has more information, but it’s much more impactful, it’s much easier to tell where the pattern is.
So I’m gonna make a bold proposition here. Every analysis should end with visualization.
Because at the end of the day, no matter how many smart people we have sitting in a room, nobody wants to look at a table full of numbers. Nobody does. They want to see something like this.
And you can do the visualization on the left, which is complex but cool and neat. Or you can do what I often do on the right, which is simple, it’s the point, and it tells you exactly you need to know very quickly.
So we finish the data pipeline, we actually had to do quite a bit of engineering before we jumped into the analysis. And kind of quite a few steps that go in there. So we’re done, that’s it. That’s data pipeline.
One more thing. Sanity checking.
So I can tell you from experience how easy it is to have spent days on the engineering work, and you spend days on the analysis after that, and you get your first set of numbers, you got your visualization, and you send it off.
And your manager looks at it, and your manager goes: that day right there that didn’t happen, it just didn’t happen. We’d never had those sales.
So what actually happened probably a process or a job ran twice, it wrote twice to our database, our data got doubled, and our revenue got doubled.
So what I urge you to do after you finish any analysis is to sanity check your data. Spend one extra hour, two extra hours, this is not like an hour of like intelligence or skill, it’s just grit and perseverance. It’s awful but do it you’ll thank yourself because you’d rather that you find the error than someone else.
Two takeaways from this talk, the first is data skepticism. You should be skeptical of the results that you produce, you should be skeptical of the results that others produce because as we’ve seen, there are so many places to go wrong in the data pipeline. You could scrape data incorrectly, and you can store it incorrectly, you could clean it improperly. So be skeptical.
The last point is full stack data analysis. I think analysts for too long have been siloed in a skill set that only focuses on BI tools Excel, and they can benefit greatly from bracing an engineering mindset, a software development set, to be able to work with a data pipeline and work with their data from start to finish.