Skip to main content

JSON

 JSON

It's day 90, and today we're going to start learning how to use JSON (java script object notation - pronounced Jason) to get data from other websites. It's the first step on our journey to web scraping.


JSON is a text based way of describing how a 2D dictionary might look. This is important when sending messages to other websites and getting a message back and decoding it. Most of the time, the message we get back will be in JSON format, and we need to interpret it in Python as a 2D dictionary to make sense of it.


Go Get The Data

πŸ‘‰ Let's do a simple data grab from a free to use website - randomuser.me that generates some data about a fictional user.


import requests # import the required library

result = requests.get("https://randomuser.me/api/") # ask the site for data and store it in a variable

print(result.json()) # interpret the data in the variable as json and print it.


Run it. You'll get lots of data.


Tidy it up

πŸ‘‰ Next, let's try to tidy that up a bit.


import requests, json #imports the json library

result = requests.get("https://randomuser.me/api/")

user = result.json() #a dictionary containing the user's data

print(json.dumps(user, indent=2)) #outputs the json to the console with an indent to make it more readable.


This should format your output a little better, and you should be able to see that it is indeed in dictionary format. The first level dictionary is called results.



Output
πŸ‘‰ Here's the code to output one piece of data about the user. I'm going to output their first and last names. I've commented out the 'output everything' line of code to focus on the one piece of information output.

import requests, json 
result = requests.get("https://randomuser.me/api/")
user = result.json() 
# print(json.dumps(user, indent=2)) 
name = f"""{user["results"][0]["name"]["first"]} {user["results"][0]["name"]["last"]}""" # Get the first and last names from the results dictionary and assign to a variable
print(name) # output the variable

Every time you run the code, it should get a new random user from the site and output their name.

Pictures, Everybody Needs Good Pictures
If you scrolled down the big json data file, you might have noticed that images were also part of our random user's profile: 
 

πŸ‘‰ Let's get the image as well and let's store it in a local file. Here's the code in isolation: 

image = f"""{user["results"][0]["picture"]["medium"]}""" # Get the user's profile picture and assign to a variable, changing 'medium' to 'large' will make the image less pixelated
picture = requests.get(image) #downloads the image
f = open("image.jpg", "wb") # opens the image.jpg file for writing in binary (data of the image will be added to the repl)
f.write(picture.content) #writes the image to the file  
f.close() #closes the file

print(image) 

πŸ‘‰ And here's all the code:  

import requests, json #imports the json library

result = requests.get("https://randomuser.me/api/")
user = result.json() #a dictionary containing the user's data
# print(json.dumps(user, indent=2)) #outputs the json to the console with an indent to make it more readable.

name = f"""{user["results"][0]["name"]["first"]} {user["results"][0]["name"]["last"]}""" # Get the first and last names from the results dictionary and assign to a variable

image = f"""{user["results"][0]["picture"]["medium"]}""" # Get the user's profile picture and assign to a variable, changing 'medium' to 'large' will make the image less pixelated
picture = requests.get(image) #downloads the image
f = open("image.jpg", "wb") # opens the image.jpg file for writing in binary (data of the image is added to the repl)
f.write(picture.content) #writes the image to the file  
f.close() #closes the file

print(image) # output the variable 

Loops Loops Loops
πŸ‘‰ We could use a loop to achieve the same thing, but make our code a bit neater and more readable. We only get one user back from this website, but this code would deal with multiple users too.

I've gone back to just outputting the name to simplify the example. Here's the code:  

import requests, json

result = requests.get("https://randomuser.me/api/")
user = result.json()
# print(json.dumps(user, indent=2)) 

for person in user['results']: #loops through each person in the results dictionary
  name = f"""{person["name"]["first"]} {person["name"]["last"]}""" #creates a string with the name of the person

  print(name)#prints the name of the person 










Comments

Popular posts from this blog

HTTP & Sessions

 HTTP & Sessions One of the main protocols (rules that govern how computers communicate) on the web is called HTTP. HTTP is what is known as a stateless protocol. This means that it doesn't 'remember' things. It's a bit like having a conversation with a goldfish. You can ask a question and get a reply, but when you ask a follow up question, the original has already been forgotten, as has who you are and what you were talking about. So if HTTP is stateless, how come my news site remembers to give me the weather for my home town, my preferred South American river based online store tells me when it's time to order more multivitamins, and I'm justifiably proud of my #100days success streak? The answer is......... Sessions Sessions are a way of storing files on your computer that allows a website to keep a record of previous 'conversations' and 'questions' you've asked. By using sessions, we can store this info about the user to access later....

Web Scraping

 Web Scraping Some websites don't have lovely APIs for us to interface with. If we want data from these pages, we have to use a tecnique called scraping. This means downloading the whole webpage and poking at it until we can find the information we want. You're going to use scraping to get the top ten restaurants near you. Get started πŸ‘‰ Go to a website like Yelp and search for the top 10 reastaurants in your location. Copy the URL.   url = "https://www.yelp.co.uk/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA%2C+United+States"   Import libraries πŸ‘‰ Import your libraries. Beautiful soup is a specialist library for extracting the contents of HTML and helping us parse them. Run the Repl once your imports are sorted because we want the Beautiful Soup library to be installed (it'll run quicker this way). import requests from bs4 import BeautifulSoup url = "https://www.yelp.co.uk/search?find_desc=Restaurants&find_loc=San+Francisco%2C+CA%2C+Unite...

HTML , Tags , Body , Headings , Paragraphs , Images , Bullets , Linky ,

 Hyper Text Markup Language Over the next couple of days, we'll be taking a crash course in HTML (Hyper Text Markup Language). HTML is a markdown language. This means that it is used to tell webpages how to render on screen (basically how to look). It is made up of a series of instructions in <tags> that surround text/image filenames, etc. and influence how they are displayed on screen.    Tags Now let's start creating a webpage and learning about the tags. πŸ‘‰ Step 1 is to tell the file that this is an HTML page. These are the first and last tags on your page. Notice that the last tag has a forward slash before the command. This means close or end this tag. With a few exceptions, tags come in pairs - an opening tag (no /) and a closing tag (with a /).  <html>    </html>  Head The <head> tags contain a lot of invisible information about the page that you won't see on screen. Stuff like: How to display your webpage on different de...