Picture of Mike Jelen
Mike Jelen

Financial Forecasting using Python & Snowflake

Twitter
LinkedIn

When doing Forecasting & Budgeting traditional this is done for General Ledger, Sales and Manufacturing. You can also take a similar approach to trying to predict the price of a Bitcoin using Python.

Snowflake’s Data Marketplace has many free data shares across most domains and industries to use as a starting point.

Check out our Meetup video below to watch an overview of this event:

Other Meetups and Events to check out:

Transcript from the Meetup event:
So welcome to another awesome Carolina Snowflake Meetup group event. Today we are going to be going over financial forecasting using Python and Snowflake. We just like to go over some quick basic rules for everybody. 

We always try to be respectful of everyone and stay on mute and keep the chat to ask any questions you guys might have and try to keep it on topic. But we’re always open to in the discussion board in our Meetup group, adding anything that might be off topic. 

And you do not have to have your video on if you’re not comfortable with that, but if you are, you certainly can. And we always ask that you recommend or invite a friend and hit us up on Twitter. And we’re also on LinkedIn and Facebook and Instagram. 

So first we’re going to start out with some introductions to forecasting in Bitcoin and we’ll be doing some Bitcoin data exploration and some data modeling and that will go into feature scaling and engineering and we’ll be using Facebook profit and linear regression with the modeling. 

And then we’ll discuss next steps and have an open discussion and talk about the upcoming events we have going on. All of our previous meet ups can be found on our YouTube channel and they’re usually up there in one to three days.

We try to get them out there as quick as possible and you can find the link to our YouTube channel in the discussion board, but I can also throw it in the chat later if anybody’s interested in out some of those other videos we have. 

And this one once it’s up. I’ll let Christian take over on this Snowflake refresher. Hey, guys. So this is the Snowflake meet up. So just, again, some people are very familiar with Snowflake, some people aren’t, but again, just a quick checkpoint. 

So Snowflake is the Data Cloud Data Warehousing platform or cloud Data Warsing platform? We think it’s awesome. It’s got a lot of great features for enabling data, ingestion data governance, access and governed access to data and shareability of data. 

And it’s near infinite scale. So it’s become a very extraordinary tool in the world of analytics these days. And really, it’s out there solving the problem of taking a myriad of disparate and heterogeneous data sources and allowing different functions throughout the process of transformation, access normalization aggregation with great performance, great speed, and great access to the top before. 

And then, of course, allowing access to that data to turn into information by a myriad of data consumers, enabling machine learning workloads, operational reporting, and so forth and so on. So we’re big fans. 

Hopefully you are as well. And I’ll turn it over to let’s see here. Oh, well, yeah, let’s do a quick introduction, forecasting, and then I’ll hand this off to Annapol. So this meetup is really talking about if we’ve got Snowflake, which might be the repository for our data. 

If you want to do some straightforward functions such as forecasts and budgeting, how would we do that? How about we take advantage of the Snowflake ecosystem? So when we talk about forecasting, maybe we’re looking at general ledger type, data sales, data manufacturing data. 

So it’s kind of a horror horizontal and vertical kind of mix when we look at forecasting and budgeting data. But you can also think of forecasting along the lines of machine learning for things such as just general anomaly detection right outlier, identifying outliers and things like that in the data. 

So, you know, really any linear regression use case kind of fits the bill here. But you can think maybe drill down into things like maybe, you know, recovery, covering lost revenue, potentially identifying customers that don’t immediately fit a perfect profile or segmentation.

And then there’s lots of other industry use cases such as, like in healthcare, right? Just trying to determine maybe like drug dosage to blood pressure relationships and patients and things like that, which we’re seeing a lot more use cases out of healthcare these days, for obvious reasons I won’t go into right now.

But that’s just kind of a general intro to forecasting. And we’re going to spend pretty much the rest of the time today talking about that. Yeah, I’ll just talk quickly on bitcoin. And basically what it is I got to grab my notes because I’m kind of new to the subject myself, but basically what it is is users are requesting to purchase a Bitcoin in a Bitcoin network and miners are picking up these requests and creating blocks, or hash keys, which become a blockchain, which is basically a general ledger.

And that’s basically the gist of Bitcoin. So I will send it to Drew for some data exploration. Hey, everybody. So I think with regards to data analysis and data exploration, one of the first things that is so incredibly pertinent is to have a problem in mind or a focus or a scope in mind. 

I don’t know if you guys have come across this issue when you have done data analysis in your past, but I know that when I am just thrown a data set, I end up tooling around with it and trying to figure out what I can do with it. 

And I spend way too much time not getting anything out of it. But if I approach a data problem with kind of a scope or a problem in mind, that’s where I’m most effective because that’s how I know that I can isolate in the data set what I’m actually trying to get out of it. 

So today our focus is going to be mainly cryptocurrencies, and more specifically, we want to look at the Bitcoin daily closing price and see how we can forecast that using linear regression and the Facebook profit model as well.

So along with that, we actually need to figure out how available our data is. So what we’ll need ideally is time series data to solve this issue, something that’s going to show us daily snapshots of our Bitcoin daily closing price in US dollars.

And in order to do this, we’ll probably look in the area of the Snowflake data marketplace which has a plethora of open source data that we are able to readily access via Snowflake data shares. And maybe we’ll either want to look at a couple of other crypto finance data sites to do any sort of web scraping, get anything out of that that we might not have been able to get out of the Snowflake Data marketplace.

And ultimately, once we get into that data, we’ll want to figure out the attributes of interest and really. Drill into how we can get the most out of the data such that we can start to take that data and move it into a forecasting model.

So, like I’ve said, the first place we want to look is the Snowflake Data Marketplace, and more specifically, the I guess Noma is a company that shares a bunch of data and they have dozens of data sets ranging from health data, specifically like COVID data to finance data, and you name it.

They do a lot in just providing open source data. And with regards to this Nomad, we’ll want to look at Noma finance data and one data set, this Cmcccd 2019 data set, which gives us kind of a glance at daily snapshots of crypto value supply and has other metrics as well.

So that kind of goes back into what I was talking about earlier with us wanting time series data. This is going to give us that time series data, and we’ll also take a look at or we’re also going to get our data from Bit Info Charts, which is a site that keeps various cryptocurrency metrics from average block size to mining profitability to average transaction size.

You name it. They’ve got all of your miscellaneous cryptocurrency metrics. And in order to be able to do that, we’ll have to scrape from that website. I will start to get into a walk through. But before we do that, on the screen ahead of you is just kind of a look at.

The different data sets that Snowflake offers as a part of that finance data atlas. They’ve got dozens of data sets. We’re specifically going to look at one. So, Heather, can I share the screen? I’ll go ahead up sharing, and I’ve enabled it for you.

Awesome. Thank you. All right, can everybody see my screen? Yes. All right. So, right in front of me, just briefly wanted to show you the Snowflake Data Marketplace. And like I was mentioning earlier, it’s a place where if you want data, you can find it here.

There are tons of open source free data sets to be able to access inquiry via Snowflakes data shares. And for example, if you want to look at real estate data, noma actually has a real estate data set or data package. 

And there’s also Equifax has open source as well. Really? Anything that’s going to come without this little personalized tag here at the bottom, like right here with Equifax, that’s going to be free data and you can just get data and then click the role you want to use and then it’ll just pop up in Snowflake and you’ll be able to query it like it was your own data. 

But getting back on topic here, I was talking a little bit earlier about the Cmccd 2019 data set that is part of Noma’s finance data atlas package. So. If we’re looking at this fresh, we don’t necessarily know what all the data is actually a part of this data set. 

So let’s just take a look real quick to see what is in there. So just doing a quick top 100 on the table. We can see that we’ve got a couple of different columns with cryptocurrency than the actual measures that are being calculated here.

So this would be that they’re calculating reference rate in bitcoin. But if we just want to look at the definition of it, it’s the price of an asset quoted in bitcoin. So we’ve got definitions, measure types, measure names, you name it.

And then ultimately, what we really want is our time series data, like I mentioned earlier. So we find our date and we find our value. So if you’re thinking x and y values, x being the date, Y being the value, so that’s great.

However, we want to look for closing, daily closing price. So ultimately, let’s see if we can boil down these the different measures, the different types of metrics that are being calculated in this data set.

So we see that there’s 156 different calculations that they’re making on each different type of cryptocurrency. Let’s try to boil it down even more to see if we can find our closing price that we want.

And all I’m doing here is I’m just trying to get a distinct listing of these measures. And right now I’m going to try and filter it using this. The I like operator that Snowflake has, which basically is.

Kind of inequality measure that you can use. It’s case insensitive. And I’m just trying to say that I want any measure name that has price within it. So I run this and I see that I’ve got two, so I want the one that’s in US dollars.

So I’m going to tab this and keep it for later as my measure name. But ultimately I want to find bitcoin as my cryptocurrency. So if I don’t know the actual name or how it’s being referred to in the data set, what I’m going to do is I’m just going to do the same thing here and I’m going to see that we’ve got 393 different cryptocurrencies that metrics are being calculated for and same thing.

I’m going to just see if I can’t find which ones include the term bitcoin. So run this again and I see that my third record is bitcoin. So I’m also going to use this BTC Identifier in this last query that I’ve got and I’ve mentioned.

So I want cryptocurrency equal to BTC bitcoin, and I want my measure to be priced in US dollars, kind of like we said on this last query up here. And voila, we’ve got our time series data now, so we’ve got basically every day our bitcoin price every day just set up.

So we got one record for one record per day, it’s got our closing price and that’s what we wanted to get out of this. So. This is the data that we’re going to be using to do our financial forecasting and yeah, I’ll turn it back to on Pub.

Thank you. Do you guys see a jupyter notebook? Yes. Thank you, Heather. And thank you, Christian, for setting up a strong foundation so as to clearly explain that. We’ve used the Nomadeter set from Snowflake marketplace and it’s given us a good view of what the data set looks like.

But now the idea is to kind of extend from Snowflake connected with Python and that’s where we can kind of play around with machine learning, implement some of the machine learning models. So what we’ve done is we’ve taken the data set from Nova, we’ve combined with more data, which is basically a pepscrape data set, so we have more information around bitcoin prices, around transaction on the blockchain and then using machine learning models, regression and a Facebook profit.

The idea is to see if we can come up with some of intelligent predictions. So going forward, not spending too much time, but it will help you have a quick idea of bitcoin in the blockchain process. Looks like.

Let’s assume that you have a user which starts a request for bitcoin. It goes into a bitcoin network where you have a miner who picks up the request for bitcoin and it goes on to mine a block. And then essentially that block is assigned a hash value, which is nothing but a specific key value to a block.

And then all of these blocks kind of tie together to form a general ledger which is basically be the blockchain. So a very high level information. But it’s good to understand that you have these boxes and these blocks have their own sizes, which are essentially transactions and which are created by miners and then which have their own minor profitability and so on.

So all these features are something which will be taken into to account as we go into production of Bitcoin prices and we go ahead as part of this machine learning model, our target variable is going to be the end of day, the closing day price of Bitcoin.

So do they talk about the Nomad data set? I won’t spend too much time, but the idea is that no one talks about cryptocurrencies. It gives a lot of information about AI platform. And moving on to the Piping code, as you can see, VPA libraries for the Snowflake connector, which is I’ll talk more about it.

It connects the Python Jubilee notebook to Snowflake and then you can kind of interact with Snowflake and so on. Then you’ve got your standard Pandas numbing libraries to kind of do Pandas code. And then you have your plotting libraries, map plot like CPA and so on.

And then starting here you have ScikitLearn for machine learning specific libraries that you have. This piece of code is essentially where you kind of connect your Python Jupiter notebook with Snowflake.

And then this piece of code kind of creates the authentication part, connects you. And then once you have an object created, you can kind of execute code from Python, just like executing code on Snowflake.

So, for example, for this particular scenario, I’m sharing the context where I’ve used the account account admin. I’ve set my warehouse upset my database. If you guys want to know more about databases, whereas Role Christian has covered this in the past, you can go back to the AITG.

GitHub and website. You can understand more about this. And this is what your table, your views look like. So you have the finance data assets that we imported in Snowflake and we just walked over it.

So just go on to the next one. And this piece of code is just executed computing a select fetch all data from the table, which you just spoke about around cryptocurrency. So if I just run this right now, you see that data kind of gets pulled into a data framework.

And a framework, it takes few seconds and then we can go ahead. So if anyone has any questions, feel free to ask a question before I go ahead. Anyways, business execution happens. It might take a while.

So we’ve got this data frame here using the head method and gives you an idea of what the data set looks like. So we’ve got a lot of categorical variables, and then we’ve got the closing prices from normal data set, which is aligned to a date, so you have value in your date and so on.

And if you do some more EDA, you see that your range of the value is from 2009 to that’s the range of the data available. And then we have value for around 393 different cryptocurrencies. So, for example, you have Bitcoin, you have Litecoin, Dodge, Coin and so on, and obviously dominated by bitcoin.

Around 40% of transactions have been for bitcoin. But this information is not enough to kind of make good predictions because we need more data to kind of have good set of features. So what we’ve done is we have a source where we collected the pre script data.

Which spoke about your Bit Info chart data and which gives me a lot more information around open price, highest price, lowest price, loading price, and so on. And it’s mapped to a date. So what we’ve done is we’ve taken the Petscraft data and if you guys want to look at the source we’ve mentioned the source of the data and we’ve taken this dev script data and merged it with the Nomadata to create a more extensive data set right here.

And this is what you see that after connecting I have more features that I can kind of use to input as part of my machine learning model. Hey, autopause. Sorry to interrupt. Kumar had a question in the chat about how many rows did it fetch from Noma?

Yes, I believe so. I can check that right now. If that’s why it must be a log. But. If the password has been encrypted here for authentication. Sorry, the password has been authenticated. I’m sorry. Answering the first question right now.

18 million records being fetched from this data set. And you can see it gives you 18 million records across 14 columns. That’s one of them. And for the second question, Heather, around the password yes.

He asked if for authentication has the password been encrypted here? So what we’ve done is we’ve hidden this particular cell. So if you look at this particular cell that’s where we’ve actually put the username password and account.

So we’ve put them as variables and we didn’t sell it without using it. So that’s how we connected for now. And now we have one data set which kind of combines the two and moving on to the EDA part. So this is just a standard piece of code to plot data but the idea is to look at.

So for example, if I look at the value is is nothing but the closing price of Bitcoin starting from 2009 to almost May 22. And you can see that it’s been pretty stable for the last between 2013 and 2017 and 2017 there’s been a huge increase in price of Bitcoin.

It’s almost reach out to $60,000 and I think last time I checked it’s around housing around $40,000. And that does make sense because that’s what the balancing line kind of looks like. But just good to have an idea of how the price of has been kind of fluctuating over the years then what we have done is to kind of do more EDA on the features.

We kind of created a correlation matrix. It’s a quick catch. A correlation matrix just kind of explains what’s the correlation of your features with each other. Correlation value basically ranges between plus one and minus one.

So if you have a value of two words plus one, it indicates positive correlation that the value increased together and it’s negative one. It basically means negative correlation. So of all the features that I have here and then when I plot so I plotted this diagram using Sibon heat map and correlation.

If you look at this and if you look at the bar on the right here, colors which are dark or green indicate towards negative correlation and colors which are towards your on the other side can’t indicate positive correlation.

So looking at this it does make sense that you’re mining profitability kind of. Down with the amount of transaction being increased, with the amount of blockchain block size being increased. And this kind of gives you indication of what features are behaving and correlating with each other.

Just add a few observations here. And so we so once we’ve understood the features, what features matter, what are important? We move on to the step of feature engineering. And when I say feature engineering, it’s very important to talk about it because this is time series data, right?

If I just take this daily data and then just plug it into my machine learning model it will not be able to capture all the trends and all the connections that might happen in terms of time related parameters.

So we engineer and create new features. For example, in this scenario there are multiple ways of creating new features. So for example, you can take the moving averages, you can create your standard deviations and I’ve just mentioned the code for a lot of them but I only cover few.

And we’ve used the Pandas library called Pandas underscore Ta which basically stands for Pandas underscore Technical Analysis. It kind of creates those technical analysis indicators and parameters that can be used and created as a feature.

So I’ll take an example. So what we’ve done is given the feature that we already have from the data set, we created moving averages for those features. So we created seven day moving averages, 30 day, 90 day and so on.

So for each feature we’ve created this. And you can see your data starts to kind of smoothen out as you increase your averaging period. And that does make sense. As you keep increasing the averaging period, it tends to smoothen out.

So so similarly, we’ve created standard deviation as extra features. And that’s just executing your moving average code for each feature. Similarly, we’ve created standard deviation for each of the features that we have and eventually all of these features are taken into account as part of our input data set.

So I’ve just taken two scenarios but a lot more that you can do. You can also talk about weird moving averages. You can talk about exponential moving averages. That all can be done by using the Pandestenoscope DA library that you have.

Once you have your features set in, then you need to kind of scale the features. So when I scale the features, that basically means that you might have different features with different values and they need to be normalized to a certain value.

Otherwise your mob machine learning model will give you different will give you a wrong output because you have different values under different scales. So there are multiple ways of scaling your data and one way to do it is you can use a Robo scaler.

Again. These are all from the Psychic Learn Library and Pandas. Sorry. So you have proof scalar, you have min max scaler and then you have standard scaler. So what min max scaler basically does is that if your values are positive it will scale the values between zero and one.

And if your values are negative, it will scale between minus one and one. And since these values are prone to get impacted by the outlier, it can sometimes help to filter out the outlier. So if you call the rovers scalar and then call the min max scala as you can see all the values are kind of scaled between the value of zero one one since it’s just positive.

And now this kind of unifies your data set and becomes a good source of data for modeling. I could have also used standard scale, which is a very common way of scaling it. But that assumes and that once that you want to have your final distribution following the normal your Gaussian distribution in that case it’s great skill.

But if you don’t want to change the underlying distribution, I mean, maxcala will just dominate for you. And then after you finish the scaling, there are some checks you want to do. For example, I check for any null values.

I’ve done that internally, part of the code here. But check for null values, check for any blank values. You can actually do fill in if you want to null values. And then after that, what I’ve done is you have a random forest aggressor which basically is one of the ways to kind of indicate that what are the most important features that are accounted while your model tries to calculate or predict the target variable.

So even though random forest in itself is just a machine learning model. But if you just want to understand that what features are the most important features in one particular approach is that you just fit your scale data, your target data, to a random forest model and then it has the internal ability to tell you that what are the features which are important in terms of your output.

And again, this is another step of doing feature engineering. There are ways to do it. You can actually ensemble, you can take the output of these features and then plug it into some other model and use that to kind of get your prediction.

So there are a number of possibilities once you’ve got your important features. In fact, I’ve just listed I think ten to twelve important features increase at least about 50 20 also that’s required, but just.

Ten important features and it’s interesting because you can see that your closing price of the bitcoin is somehow very closely linked with the highest price, with the moving average at the lowest price, the average transaction value.

And again I see different lowest prices and different moving averages of prices and then number of coins and circulation, which does make sense. Yeah. So that’s from a feature scaling, feature engineering standpoint, once we’ve got the features kind of aligned, scaled, we’ve created the features that we want.

You can go ahead and you can do some working. So what I’ve done is since my target is my value, which is the closing price, and I want to remove date from it, I’ve just created my trading data set, which is basically your X, and then I have my target data set, which is basically the value of the target variable, which is value again the closing price of Bitcoin.

Good idea to kind of divide your data set into training and testing. This note of caution just for simplicity, I’ve just divided across 80 20% of division between train and test. But good practice is to have it divided across train, test and validation.

So ideally you want to test your models on validation dataset first and then use that to select a model. Once you have selected model then you can test it on the test data to make sure that you have the right values being predicted.

But over here is just an 80 20 plate between train and test. So all my train data is in X ray and y train and test is an XT and y test standard matrix. So we have your mean average error and then your root mean speed error.

Being calculated as the matrix to calculate the difference between the predicted and the actual value. And then what we’ve done is over here we fit the training data, and then we predict on the train data first, and then we predict on the test data first, and after that and then we calculate the matrix to see what the output looks like.

And over here, you can see that the training error is a lot higher than the testing error, which basically indicates there might be a scenario of overfitting. It might or might not be, but usually a high training error could indicate that it’s a case of overfitting.

And what does that mean? So you basically have two scenarios where your model can overfit or your model can underfit. So if you make a very complex model, a model which is very sensitive, it’s over engineered, there might be a scenario that it’s going to overfit or get trained on noise which does not even exist, or it might fit so well on the training data that it does not generalize well to other data sets.

And similarly, if you don’t use the right amount of features and data, you might have an underfit model which might have missed out some of the patterns or some form of trends in your data, which is called underfitting.

So it’s all about basically balancing out your overfit and under fit to find out to get the right data, to get the right model and then just to see what the output looks like. So you can see that we plot the output so why and the code test is the actual value and then why test underscore prediction is the greatest value and then it plots the result.

You see that there’s model is kind of able to catch the pattern and in fact since this model. Is actually doing a very good job of catching it, which makes it suspicious that there might be an area of overfitting.

But yeah, since the simple model, we’ve not done too much of configurations. For example, one way of improving the model is you could do a cross validation. So, as I said, if you have a validation, set it that is a must.

And then you could do a cross validation to make sure that you have all possible scenarios covered. You could do a grid grid optimization to make sure that you have the right parameters factored in and so on. 

And similarly, out of eight real quick, this is Christian. Just before you get to Facebook profit, I want to call somebody mentioned something earlier when you ran that poll from Snowflake to bring back a Snowflake real quick, they said it was really fast. 

I don’t know if you actually dress like this is on your machine, right? Like this is not like a massive game in the cloud or anything. Running what specs are you running this under? This is on my local machine, and I’m running on the M One processor, Christian. 

But I think given for 80 million records, it kind of took, what, 30 seconds? Approximately 30 seconds. So, yeah, that’s pretty fast. And then just a reminder, everything you’re showing right now is open source, right?

Well, we’re going to open source it and put it out there for everyone if they need if they want to take a look at the code and so forth. Yeah. We’Ve mentioned all the sources. Along with each piece of code, we also mentioned and some interesting papers and reference points which will kind of supplement the code.

Cool. Yeah, just wanted to hit that checkpoint before you got to Profit. You sound like you’re a good breakpoint there. Alright, sorry. Similarly, if you got something like a Facebook profit, which is, I believe, the open source time series forecasting model and the entire idea of having Facebook profit is that it takes away the complexity of considering all time series complexities and it gives you a lot of flexibility to kind of change parameters.

So even if you don’t clearly understand the complications of modeling time series, it kind of encapsulates that for you. And then you only have to focus on the data being put into the model. So I would highly recommend that these two references this is a paper published by adding two Facebook engineers about what they’ve explained how this model was built and what’s the idea behind it.

And along with this documentation which explains what parameters can be used to customize the Facebook profit model. What we’ve done is we’ve again build a very simple model, this FII. I use Honda for all my libraries.

To keep it consistent. I have created a conduct environment, load my file into it and then similarly we downloaded the SP profit one. I don’t remember if I was able to get the SP profit from Corona, but if not, you can always do a Pip, which is the Inbuilt Python program package manager.

You can use that to get it done. So I put the Profit zero 7.1 version and what it does is you need to be very specific with respect to the format you’re using to kind of call the. Profit model. So for example, it wants you to specify a data frame which has this header DS which is basically date and why is nothing but the target value that you want to predict.

And then it wants you to specify this date time format for the date. And this is something that you need to follow to kind of get the data right from this model. And again, I’ve just taken 300 data sets.

So for example, I’ve divided my data into training and test. So I’ve taken 300 data points in my test and my remaining into my train. And then over here, as you see, I will call this model it just runs.

And then as you see, this is very similar to how you see a linear regression being executed. It tries to kind of work minimize the value of log probability. You can see some form of gradients being worked at and then other parameters like alpha and alpha zero.

Something very similar to a linear model analysis happening here. And then once you’ve done that, you fit the model you try to test on the test data. I’ve done the prediction focus and let’s see what the plot looks like.

Simple model, a lot of configuration required and you can see that you’re way off from what the actual value is and what we’ve predicted. So that means you’ve got to play around with the parameters. And I specified some of the parameters you can go and configure as part of the profit model.

If you want more details. We’ve specified the link on Sell above where you can look at the documentation and see what can be looked at. So for example, I spoke about overfit underfit. So you have a parameter called a change point underscore prior field where you can play around with the values and see how the model is responding to it.

You can specify special holiday special events as part of your modeling. And you can even change the model to consider a logistic curve instead of a linear curve if your data is model in that particular fashion.

So these are two ways of how you could do a prediction, then. There are a lot of other ways. You could also use some of the time series models like our and so on. You can even use deep learning and other neural networks, other stuff to kind of do that.

That’s also a possibility. And I think, Heather, you can probably bring the slides up as the extension to what are the next steps from here? Sure, go ahead and stop sharing. I’ll share my screen. All right. 

So, yeah, we went through the modeling with an upon. Thank you. Anna pav. And for next steps, you’re going to push the model output back into Snowflake and build up your dashboards. And you can use any visualization tool you’re familiar with or you like better. 

Looker, tableau power bi. There’s also streamlit flask, and then you present it to your team. So as far as the takeaways of what we went over today, Snowflake does not have a native forecasting tool. 

So you kind of have to bake your own solution. You can leverage other time series machine learning models, or you can leverage ensemble models and other markets modeling that can I’m sorry. Other markets this modeling can be applied to would be the equity market, fixed income market, foreign exchange market, money market, derivatives and financial statements as well. 

Bye. So just to review, we went over forecasting in Bitcoin. We did some data exploration with Drew and the data modeling with anupav going over feature scaling and engineering and using Facebook profit and linear regression.

And we saw those visualizations there in his notebook. And if you’re interested in checking out our code, we make it available to everybody. Our GitHub will be up in the discussion on meetup and we also try to share that in our descriptions in YouTube as well and give us a star.

All right, I’ll let you guys take over. If anybody has any questions they want to ask what the technical guys handle them? Out of curiosity on the bitcoin piece here, when you think of some of the variables to help predict the future swaying of the price and bitcoin, any theoretical thoughts on what data, even if it is magical at this point that might be of interest to put into the model to help predict the price.

Number of Elon Musk? How many times on SNL talking about that coin. Not just for bitcoin, but even for any financial instrument, even we’ve considered historical data as a point of reference with the model.

But if you ask in fact it’s a common belief that using historical data is not the right approach only using historical data because if tablet in case viewer would have been millionaires by now using historical data because a lot of complexity around focusing.

Historical data, I mean, rated values. So it’s going to be historical plus a certain value of randomness. Now, basically, how well can you model the randomness? That’s that’s the catch. That is also the interesting part of it.

That’s the other random. This and then is there deep sentiment analysis. You got to search all corners of the web to see, granted, that’s a short term spike up or down, it doesn’t really predict the future other than maybe a day or two out. 

All right, so we are also developers of the DataLakeHouse platform here at AIC G, so should check out our site. And next week on the 26th, we’re switching it up. And on a Wednesday, we’ll be having an event where we’re going over DataLakeHouse on Snowflake. 

After that, we’re having another event coming up next month where we’ll be going over customer segmentation and Snowflake using Python. So that should be another fun one for all of you Python lovers out there. 

That is it for today. Please like us on social media and follow us if you like our videos. Feel free to invite anybody to our next events and we can’t wait to see you there. Thank you!

More to explorer

AI ChatGPT

Building a Generative AI Competency (or the First Gen AI Project)

When Building a Generative AI Competency one must identify the necessary infrastructure, architecture, platform, and other resources and partners that can help an AI initiative be successful. We have just like many data warehouse and digital transformation initiatives over the last 20 years fail because of poor leadership, or companies only going half in on the objective.

Scroll to Top