SQL Notebooks to DBA's and Developers by Deepthi Goguri


Video Transcription

Hello, everyone. Thank you so much for taking your time for um attending uh OTE Global Summit. I'm so excited and honored to be here today.And uh today, I would like to speak on SQL notebooks for DB A and developers if you are a developer or a DB A and if you are not familiar with the notebooks, this is a uh absolute beginner session. It's a one on one session. I can say I will show you with an examples. Um What, what can the SQL notebooks can offer for the developers and D BS? And how can you actually uh leverage using it uh up to the maximum myself. I'm in, I'm database administrator having about like eight years of experience. This is my old slide. So it's, it's your seven years, seven plus 18 years of experience. I'm a Microsoft certified trainer and also a Microsoft Certified solution expert. I did my master's in computer technology at Eastern Illinois University here in Chicago and uh I do blog for DB N nuggets.com, that's my website. And if you have any questions regarding this session later or if you'd like to contact me or get in touch with me later. DB N Nuggets is my uh Twitter handle.

And uh you can, there is the email address and if you can also contact me on linkedin, that would be wonderful because I will be posting, what are the my latest session that I'm going to present and it's all free you can all attend. So do you have any uh any situation where you get a constant emails every now and then, like every couple of weeks you receive emails uh from your auditing request saying that, hey, can you please run certain queries, whatever queries you wanted to run? But give me this kind of results like who have access to what databases or what are the databases we have on server. Those are the typical audit request that you receive and if you receive one time a year or like two times a year, that is totally fine. But what if if you receive every single week, like twice weekly or twice weekly? I was in a situation where like I was so overwhelmed with those requests, I was so pissed off to be very frank because once I get those requests, I need to go ahead and first make sure where this, where those queries are because I'm only, I only not take care of the audits, but also I take care of many other things because I'm a DB A, I need to work on performance tuning and so many other things.

So I need to first remember where I put those scripts. They, they may be in some other folders or I may move the folders. You know, I need to remember the locations where I saved those files, the queries that I need to run on the drive. So I created this document, a word document where I step by step I mentioned first on this query to grab this information. The the query that you have to run is in this location. And then I need to go ahead and click on that location and then I need to grab the query and then run the query in management studio and get the output and guess what? They also need that, you know, they don't trust if you copy it and put it in the CS V file and then submit over. They don't like it. They want to make sure that you're submitting the genuine information. So they also need a screenshot of the timings and the server where you have actually grabbed that information from which is like another level of tedious task, right? I need to take the screenshots, put it in a different set of word document, put paste the query results as well as the screenshot.

And this I have to do as a process for every single audit request, which is like too much tedious even to hear about it, right? That's when I came to know I was like researching a lot on how can I make this process? Very simple, not only for the audit request, but also um to help your other coworkers or if you wanted to review the code of other people or if you want to document things of how you actually work and how can you make it simple for the other D BS and developers out there to make a documentation, right?

But how can we make it so simple, right? To make it so simple where we can fit the words, right, the text, the code as well as you can also save the executable code within the same document. I mean that is really cool right now what I mean by all of this, I'll be showing you what I mean like what is this executable code in a demo? But then you can write as well as you can write the query and also you can execute the same query within the same sequel notebook, which is kind of cool. I started using for my audits. I started creating a documentation of any process, right? Any process involving sequel. But when I say sequel notebooks, you only doesn't have to use SQL. There are also other languages that you can use like the Python um and Spark SQL, there are different types of languages that you can use. Now, it will help save a lot of time and you can also review the codes of the other developers if they have any issue, they can just send it over the document to you, you can execute it, you can fix the document, save it and send it over to them. Very simple, like troubleshooting documentation you can do and it will basically kill off those tedious task, 90% of the timings. So as an overview, we are going to see, what are these Jupiter notebooks, what are the sequel notebooks? What are the different kernels like?

What are the languages that you can use and how to actually use the SQL notebook to create your first um useful documentation or like any uh any like an executable code within a document. There are certain limitations that I'm also going to talk in it. So for the first for this to happen, right, for creating this document, the first thing that you need is to have a Azure Data studio. So it's free, it's a free tool. You can go ahead and download it right now. Um In the Microsoft uh documentation, there is like go ahead and search in Google as your data studio, you have the download link, download it. So once you download it, you can go ahead and create your notebooks. Like I said, instead of reading all of this slide, right? Let me go ahead and directly uh show you how it actually works. Let me quickly see if there are any comments out there. Um OK, let me share the Azure notebook. So for example, let's say I have actually downloaded. This is your data studio right now. Don't be overwhelmed by the screen that you are seeing right now. But then in the bottom, if you see here is your data studio symbol over here, right? Click on that and then go go select on the is your data studio so that I'll show you how it actually look like, right? And then here once you open up the Azure Data Studio, once you download and install, go ahead and click on new.

So you can make a new connection to the server, you can type a query connecting to the server SQL server or you can create a sequel notebook, right? I would like to go ahead and create a sequel notebook. So I'll click on um a new notebook here, right? It will go ahead and open up a new notebook. So here if you see if you observe here, right? This is how the SQL notebook will look like. But I also created a simple notebook instead of me typing in front of you. I already created a notebook, a very simple notebook here, I will show you how it works. So here we have cells, the code cell where you can type the code and execute it at the same time as well as text code where you can document things like type it some documentation, explaining the code. So once I go ahead and click on cell, if I click on text cell, right? This thing will get popped up like this is the text cell where I can go ahead and type on it. So typing is very simple. There are few different tabs in the SQL notebook. If you see right side the screen, there is this eyeball symbol, right? Reach text view, right side your screen top. That means like whatever you write is what you see that is it the second one is something different.

It's a split view. Now, when I click on it, it will open it in a different format, two different formats. One is in a plain text, pictures and everything. But if you see the left side of your screen, this is something like hash and then uh welcome to the cloud data driven. This is like a different, like I created this document for a different purpose. So I'm showing like how to create it. So the language that you are seeing here, the hash and everything is a marked down text. If you are familiar with markdown text, you can use markdown but do not worry if you are not familiar with it because the what you see is what you get, you can directly type it here. But there is also the third option here right at your screen. The markdown is complete markdown. If you are familiar, you can directly use it and then go to the um what you write. What you see is what you write something like that, you can directly see what you have written. You can use three different formats to do that, but I do not know markdown. Not much familiar. So I'm using it directly. The first tab over here where you can go ahead and type see if I type it here.

You, you have like you can bold bowl something, right? You can bold, you can do the italic, you can underline, you can highlight as well, you can highlight and then if you want to put this as a main paragraph heading, you can use it as a heading. Number one, if you wanted to do is the heading two that will become like the second subdivision of the headings. You can attach the pictures right now. Why I attached this all in one picture is like we can say that um the sequel notebooks are like all in one. And now what is this all in one is meaning like when I I studied in India, I did my undergrad in India. So we used to have like little books where in our schools where we used to have a single notebook and we used to have multiple languages like math, science, English and the other languages all combined together in a single book. So if you read that book incomplete, you will pass all of the exams. It's not like a uh exam gram or something, but you have the bulk book together. So why I actually uh put that picture here is because I think like SQL notebooks are all in one as well where you can use not only the sequel, but also if you see the kernel here, if you expand the the tab kernel, meaning the language, you can see not only use a SQL, you can use a Spark, right?

And Python powershell, you can use all of these languages and you can create like just like that each language in the all in one book, the kernel, you can create the sequel notebook separately. You can, you can use the Python notebook. You can use the Spark notebook, you can mix and match in a single notebook. You can use the sequel and then Python all the languages together. But you know what is the problem? If you do that, you can directly execute a single sequel notebook all at the same time, no matter how much code you write. Like let's say you have written the 20 code in it, you can run all in one shot. But let's say right, I'm I'm dragging it down. Let's say I have um I have written this particular code here where I have uh written uh select top 10 from uh from some table maybe. And then when you come down, I have written in a different cell because you can create any number of code cell here. So let's say I have created like 20 cells with 20 different codes. One is SQL and one is Python totally fine. But then I want to run all at the same time, right? If I hit, run all here on the top here, then the first code, let's say the first code is the first code cell is SQL, it will run the code fine.

But then when it comes to the second cell, if I read, if I write the code in Python, then it will go ahead and fail because I need to manually change the kernel before running the code. So it is not suggestible to mix and match the languages within a single uh SQL notebook, but then create different sequel notebooks, different notebook for every language you want to use and then combine all together and put it under a major. Like on the top, you put it in a different notebook, you can place all of these uh notebooks as a collection and you can create a Jupiter notebook. So Jupiter Notebook, it's nothing but the collection of the sequel notebooks collection of different sequel notebooks together.

For example, I have uh I have seen here, I have written something here. Yeah. If you see here on the left side of your screen, I have created the first Jupiter notebook and then I have created all of these like contents. What is a query store? Because couple of months ago, I have, I have given a presentation on query store and I thought why not use SQL notebooks as my presentation. So, what I did is I created a Jupiter notebook and then I went ahead and create the contents. What is a query store? If you see here each, this particular contents is a different notebook. All I have written is a text cell, right? And the second one is what is query store? I put it as a different notebook and the benefits, right? Benefits is a different one. But I have written a code in a complete different notebook. And then I have placed one by one in a format, right? It's like a notebook, real notebook where I have this collection of the SQL notebooks underneath a single Jupiter notebook. After I explained that I went to how to upgrade how I use the query store in the upgrade scenario. It's all SQL code, right? I had my comments, I had my code now I can draw, run all at the same time. I can, I can click this button and run all at the same time by connecting to the SQL server. But how do you connect to the SQL server from here? Right? Let me open up that.

Let me close all of this whatever I open mm notebook again, this one. So let's say let me close this one as well. Um So let's say I would like to run this code, right? So I created a COVID database um by collecting all the public documentation like in the CS P format. I downloaded the data and then I imported into SQL server and they created a new database known as COVID database. And now I have these tables where I can go ahead and play around. Now how to attach, how to attach the SQL notebook and how to run the queries on the SQL server, right? By using the notebook. Now, go ahead and click on attach to. Now make a connection, click on select connection or change the connection here. And I would like to uh click on COVID database. So here if you see the connection details, I'm connecting to the connection type and SQL server and then I'm connecting to the local server. This is my local computer. I have, I'm connecting to and then I'm connecting to the COVID database. You can connect to any database, even the master database and any. But I would like to connect to the COVID because I'm running on the COVID database. This particular query, you can uh name this as a server group. It is just a naming thing. Nothing much specific to any function of it. But then you can name it. I just wanted to connect to the server. Now, I'm connected to the SQL server COVID database.

Now, once I run this particular query, right, it executed. So that's how I'll make the connection. But let's say uh I would like to, I executed this query, but I want my other DB A to know what is the query, but I don't want him to see the results of it. There is a way that I can collapse the results on the top right view. If you see there, there is an erase button over there. The se second one, if you click on that one, it will clear out the results. And if you do not, like if you do not want to see the entire code of the code cell, you can even collapse that one. Collapse, meaning it will shrink. The first option is like collapse. Cells will actually shrink. If you run it again, you'll see the results again. If you want to collapse, clear the results, you can clear and then share the, share the notebook to your colleagues or friends. And then you can also run this notebook with the parameters. Let's say you created a notebook, but then you don't want to manually run the sequel notebook.

Whenever you get the audit request, let's say we have taken the example as an audit request, you have created this report and then uh let's say you get the request all the time and you don't want manual work. You can actually go ahead and schedule this sequel notebook to run on certain schedule times or you can schedule it. And then whenever they need it, just go ahead and click on run, run the job so that it can produce the results for you. And those results can be actually the sequel notebooks. And so that you don't have to download and then send it over as an attachment to the email address to the sender email address. But you can also automate that process. Once you run the job, that particular results output will be actually sent automatically to the user who have actually requested it. You can also do that. You can also parameterize right. You can also change the parameter values depending upon your environment. But currently the sequel notebook with the SQL kernel is not supported to parameterize the notebook. But then the other kernels like the uh Python and uh the other kernels, right? The Python and the powershell, they have the way to parameterize those values for you. And also if for example, if you want to convert this um notebook like general notebook, let's say this is a notebook, right?

If you want to convert the word format of the notebook into the dot SQL format, if you have a query and coordinate, you can go ahead and click on this last button over here. Once you click, click on it, it will automatically um convert that particular SQL notebook into a dot SQL format. Not only that if you observe here, I created a document um on COVID analysis. Before moving ahead to describe what the notebook is about. I would like to show you a couple of things here. So we have seen the options here how to collapse and um how to do the things, right? But then once you execute the results, you don't, you you not only see the results in the table format like this like once you execute, you see it in this format, right? But then there is a way to see the results in the graphical view. Now you may be asking me a question like, oh but why do we need that option? Right? See you are a developer or a DB A you know the code but not necessarily you can't expect everybody in your office like in your company know the code, right? You have business users where you need to explain them the business in a plain view format so that they can make the decisions for you like the management. Let's say you have to give a presentation showing the results of any result.

Like what is the trend in the data? Right? If they do not know the code, how do they understand one is like the graphical view format where that doesn't have to know the core but they can clearly see the results, right? So that's when there is this um in the results. If you see here on the right side, the last option show chart will actually show the same results in the graphical view format so that they can make the decisions and understand the uh data trends very easily, not only that, but also you can export right, you can export this particular results like this table results, right?

You can export it as ac you can save as a CS V on your system or you can save it as an Excel and the other options like a JSON and then you can also save as XML format. So here is uh my COVID-19 analysis. This is just a my uh my study on how I analyzed the COVID data uh to actually help the people in the workplace environment to encourage them to have the vaccination. So I have CRE I have put this thing together, downloaded the CS V format of COVID data from all the um web sources which are valid web sources like World Health Organization. You can follow this link to see more of this. And I have grabbed all the information like I created the database and then I provided here a couple of like um text cells I can say and then I added GIF. Now the reason why I added is like we can also add gifs, we can also add images in the, we can also add images in the notebooks, not only the code, not only the um words, but also the images and gifts here. I grabbed the top 10 count countries with most COVID cases. So what I'm doing here is I created the document, I'm running the queries, I'm connecting to the COVID database, which is the SQL server, right? And then I'm running the queries on the top of it.

So if you see the graphical format, these are the top 10 countries with the most COVID cases. The Africa is the number one. Like if even if nobody knows about the coding for the business users, if you show this document, they can just hover their mouse and see what are the top 10 countries. The second is like information on the COVID vaccination based upon the continent. So now we are going the continent wise, what are the top continents, continent wise? What are the vaccinations? So the total vaccinations that are available are these many right? These are the total vaccinations, people partially vaccinated, fully vaccinated. And what is the total population of it?

Look at the population and the number of vaccinations available more than the population, the vaccinations are available and in the United States per State, if you count, what are the uh what are the, how many people got vaccinated per state within the United States? And if you see here in the graphical view, I also added the total number of vaccinations in the United States. And here post vaccine, post vaccination COVID cases in State of California. So I took the sample California state. So on the California State, I have grabbed the information of like how many unvaccinated cases end up in the hospital and how many vaccinated cases who end up in the hospital? See the trend in it unvaccinated versus vaccinated cases in the State of California. If you see here again, unvaccinated cases, hospitalized, vaccinated, hospitalized, unvaccinated deaths. If you see how much of uh the difference unvaccinated deaths. So the business users, if you see like if you're telling somebody to take the vaccination, if they can directly see this documentation, right, they will know the impact of taking the vaccination and how the vaccination can actually save their lives, right? This is a proven analysis.

I'm not just saying like go take the vaccination. So the vaccination manufacturers who is doing better than who? Right. So here if you see pfizer is doing the better job than compared with the other other companies out there. And then there is some survey that went on like asking the people like if you are in 6 ft, are you willing to take the vaccination? Like, are you willing to put, put on the mask if you are like 6 ft if you are outside of the house and the people who said, yes, absolutely. We wanted to make sure we are healthy as well as like making sure other people are safe. And the top most counties who are willing to put the vaccinations are these top 10 counties like the number one is Connecticut, these are the good guys, right? And the top 10 counties who said, like, never, we don't want to use mask whatsoever, right? And these are the counties I do not want to point anybody like this is like a proven analysis. I'm not like masking up anything here, right? So these are the counties and I just added a gift over here like wear the mask, what is the proper way to the we to wear the mask? And then not only that, if you observe here in my analysis that I put this thing together, right?

I grabbed the top 20 counties of like um what are the top most 10 counties for the top 20 counties? Uh With the COVID cases, there is something wrong in this particular result. If you observe the first one, if I hover my mouse, it says like high income and then the second one is upper middle class income and there is somewhere like the lower low middle class income as well. Somewhere here, if you see here, somewhere here, it's low, lower middle class. So my analysis like why I do not know why they actually added this kind of like um linking up the life of a person to the financial status. Uh That doesn't make any sense to me. So I thought like my analysis doesn't want to add this kind of information. I want to make it so clear and genuine and life is a life, right? So I don't want that information here. So I thought of cleaning the data. So what I did was like, I grabbed all the data related to those three financial status. I grabbed into a different new table because I'm a DB A before deleting something, I need to make sure I have a backup. So I took, I grabbed all of the data, put it in a different table and then I deleted the data from the actual table because I do not want that information in my analysis.

So I deleted that and then I made sure everything is deleted again by running the query. So SQL notebooks is not only for the retrieval of the information, it is for inserting, updating and deleting the data as well. So what did I observe in my, what did I confirm? And what did I observe from my analysis? If you look at the data that we have gone through? Like the people who got, who did not get vaccinated and end up in hospital is like a way too high when compared with the people who got vaccinated and end up in the hospital, right? So my analysis concluded that it is it is necessary and it is advised again, we are not forcing anybody but my analysis literally proved that people who are taking the vaccination has end up in the hospital. The the uh the rate of the people who are end up in the hospital is very less and you can stay healthy. And I added a statement at the end of the um documentation like it doesn't cost you to protect your life, but it does cost your life if you're not protected, if you think about it, it's true, right? So that's my actual analysis.

And so why I put this thing together is like if you see here, we can execute the code, save the results or if you do not want to share the results, right? Just want to share the code, you can just collapse the results and then share the code to your developers. And if you have any other developer or a DB A who are struggling with solving a problem, they if they can put this, put a documentation together in a SQL notebook and share the share their results with you, you can go ahead and check uh what they are doing anything wrong. They are doing like coding wise. If you need to review their code, add the, make it a better code and then share it back to them. So many users, not only the documentation auditing, what not, you can actually make books out of the sequel on books and I became a fan of it, I started using for my auditing purposes personally. So not only that the other advantage, the beautiful advantage that you have by using the notebooks is like if you see on the left side of your screen, there are like the first option is connections.

Now once you hit the connections, I would like to connect to the master uh like the master database. So here I connected to the COVID database. So what once I click on that right? There is also this other. But let me go ahead and connect to a different uh connection. So I would like to make a new connection. All right, I would like to connect to a local server again. And then I would like to connect to master database this time. OK. So once I click on connect here, if you see here, I'm seeing all the tables within the master database. Sorry. Yeah, master database. But then now on the top here, I would like to click on the local, I would like to go to the server level and once I click on the server level, if you see here there is the SQL agent, right? So what is the SQL agent where you can create the jobs and then run it automatically? So you can also see the SQL agent underneath your management studio within SQL server. But then you can also actually schedule the SQL notebooks and make sure they run on time or you can manually run them.

So I do not want to show you how to actually create this notebook because there is already other video um by Tao Bali who have actually described how to do that. I will provide you a link down there. But then I would like to just show how you can use this um SQL notebooks and automate them, right? So for example, I'm going to uh show you this particular notebook, right? And you can directly run it or I actually executed these many times, right? So I would like to see the latest execution, right? So I would like to go to see the notebooks, the output of this notebook, I clicked the notebooks and this is the job that I executed. So I double clicked on it. And these are the recent entrances and these are the outputs with the little green collective mark over here. So let's say this is the output actually and the output is also in the sequel notebook. But then with the results, what does, what, what do I mean by that? Right? So let's see. OK, let me see this. Um Yeah, so let me open this particular uh notebook over here. So if you see, I actually um I actually created a SQL agent job by mm um by telling the sequel agent job to run this particular uh SQL notebook on scheduled times, right? So once it executed, it will provide you an output in the same sequel notebook format.

But then with the results, that's all it is, right? And then there is to then another adding in another step, you can send this output directly to the email, but we do not have much time in this um for this session. So that's why I just put it this way. But if you see like if I edit this job and see how I actually created it all I did was like I name I provided the name I provided the owner. And then uh if I see in the, let me see 11 more time, I do not move. If this job actually, I need to go to the job. Uh It did the job. Yeah. Here if you see the steps, right? All we are doing is it um edit the step. So we are actually we created a power command here. All it is doing here is like grabbing that query as a SQL notebook and then executing it during executing it at the schedule times. Now, how, how do we actually schedule it right here? If you see you, you can't schedule it. There is no way that you can schedule it. So all you have to do is like in your management studio, right? If you see this is our management studio and if you see here, the same job is created here as well because you're creating the SQL agent job in the notebook, but then it will be also created in the server and you can go ahead and schedule it from here.

You can't do it from there, but you can schedule, you can definitely schedule it within your management studio and let me go ahead and share back my slides. Um So these are the kernels I would like to show and then benefits like we discussed the benefits. So, um so one of the limitations like I said you can't mix and match, you can mix and match the kernels within a single SQL notebook. But when you want to run all the cells at the same time, it will throw an error. And sequel notebooks is also not for like if you have like 6 million records as an output, you might be seeing latencies and the is your data studio might get freezed up. Um If you, if you run those kind of like C select star from a huge tables, right? That is one of the limitations and the other limitation is that you cannot parameterize your SQL notebooks. You can do that necessarily for the P cover as well as for the spa but not for the sequel. And where like a couple of moments ago, I mentioned like you can actually schedule the sequel notebooks as an agent job, but I did not show you properly how to actually create that SQL agent notebook, right?

So you can go ahead and see this youtube video by Tao Bali for the MVP edition by Data exposed. You can they are like step by step video. It's like 14 minute video, I think so. But it is very clear and then all the references are there are from the Microsoft docs. You can see that and if you want to dive very deep into the SQL notebooks and how to actually create the powershell notebooks and then um and know more about the notebooks. Follow the video down below over there that that is being created by Julie uh from Microsoft. She went really deep into this particular topic and I would like to thank uh Women in Tech Global Conference for one more time for giving me this wonderful opportunity to present this session for you. And thank you so much for attending this session. Let me go ahead and uh go go back to my slides, not the slides, actually the video. Uh Yeah, so I think I'm done. So I'll stop sharing here and then if you have any questions, uh please free, feel free to put it in the chat. Hm Yeah, I do have the linkedin page. Uh If you can type my name on linkedin, right?

And, and I if you are a man or like men or whoever it is, like, if you, if you need any suggestions on like how to start your career in technology and if you can contact me, I provide, I provide the free um like recommendations of where you can start. Like, because when I started my career, it was very difficult for me because I do not have much resource of where to, where to start and how to learn stuff, looking all the things at the same time really overwhelm me. And in that moment, if there was a, there was someone who can actually guided me that would have made me um you know, uh made that would have made things very clearer and simpler for me. But then I thought of, I thought of taking, taking my privilege. Like right now I'm a public speaker. So I thought I could also help the other people out there who are really struggling to get into the technology. So thank you Alba. And uh yeah, definitely I can, I can uh not necessarily like a uh step by step mentor, but then I will be there if you have any questions to me. Um I can definitely assist and uh and I'm also on Twitter at DB nn, I guess I will be chatting here, like I'll be typing in here. That's my uh Twitter handle. I'm very active on linkedin and Twitter.

I'm not on Facebook or like, I'm not on other channels out there. But then um I'm, I'm very active on linkedin and Twitter. So if you can contact me and also I do have my website if you are a SQL person and if you are, if you are trying to learn SQL or as your sequel uh for the beginners, I'm, I'm writing and I'm consolidating the articles together and writing some blog posts on as your fundamentals for beginners.

You can also go ahead and look into that if you would like to and thank you so much. Once again, I do not want to really go ahead and um step on other, other speaker's time. Uh But yeah, thank you. So much once again and I hope you all enjoy the conference and see you all next year. Bye bye.