Assignment
Motivation
The Internet gives us access to a vast amount of data in digital form, but one of its disadvantages
is that this data is ephemeral. Web sites appear, update their contents, and disappear at bewildering speed. While we can still read books printed hundreds of years ago, or even a papyruswritten thousands of years in the past, we often fail to find a web site or online article from justa few years ago!
One prominent attempt to preserve digital data is the Internet Archive, a vast repository of
web pages downloaded and stored over a long period of time. Viathe Internet Archive’s Wayback Machine interface you can retrieve archived copies of web pages from specific dates in thepast.
In this assignment you will build your own Internet Archive-like IT system. Your system will give its user the ability toview news or current affairs articles that have been stored in apermanent archive, as well as the current news. It will have agraphical interface that allows the user to select dates from thearchive, after which the relevant news articles can be extractedand viewed in a standard web browser. It will also allow the userto access the latest data from an online “feed”.
Goal
For the purposes of this assignment you need to choose a regularly-updated source of online
news or current affairs. You have an entirely free choice of the news category, which could
be:
• world news,
• politics,
• fashion and lifestyle,
• arts and entertainment,
• sports,
• business and finance,
• science and technology,
• etc.
Whatever category you choose, you must be able to find an online web site which contains at
least ten current stories or articles on the topic at all times. The web site must be updated
regularly, typically at least once a day. Each of its articles must contain a heading, a short
synopsis, a (link to a) photograph, a link to a full description of the story, and a publication
date.
Specific requirements
To complete this task you are required to produce an application in Python similar to that above,
using the provided news_archivist.py template file as your starting point.
It is suggested that you use the following development process:
1. Find a suitable source of online news articles. Keep in mind that the web site you
choose must be updated daily, must have at least ten articles online at any time, and
each article must have a heading, a short synopsis, a (link to a) photograph, a link to a
full description of the story, and a publication date. Some starting points for finding
such sites can be found in Appendix A below.
2. Create your “Internet Archive” by downloading seven copies of the web site, one per
day.
3. Study the HTML/XML source code of the archived documents to determine how the
elements you want to extract are marked up. Typically you have to identify the
markup tags, and perhaps other unchanging parts of the document that uniquely identify the beginning and end of the text and image addresses you want to extract.
4. Using the provided regex_tester.py application, devise regular expressions
which extract just the necessary elements from the relevant parts of the archived web
documents.
5. You can now develop a simple prototype of your “back end” function(s) that just extracts and saves the required elements from an archived web document.
6. Design the HTML source code for your “extracted” news stories, with appropriate
placeholders for the downloaded web elements you will insert.
7. Develop the necessary Python code to extract the HTML elements from an archived
document and create the HTML file. This completes the major “back end” part of your
solution.
8. Develop a function to download the “latest” news and save it in the archive.
9. Develop a function to open a given HTML document in the host computer’s default
web browser.
10. Add the Graphical User Interface “front end” to your program. Decide whether you
want to use push buttons, radio buttons, menus, lists or some other mechanism for
choosing, extracting and displaying archived news.
regex_tester.py
#—–Description—————————————————-#
#
#
# This program provides a simple Graphical User Interface that
# helps you develop regular expressions. It allows you to enter a
# block of text and a regular expression and see what matches
# are found. (Similar web browser-based tools can be found online,
# but the advantage of this one is that it’s written in Python, so
# we know for certain that it obeys Python’s regular expression
# syntax.)
#
#
#——————————————————————–#
#—–Useful constants———————————————–#
#
# These constants control the text widgets in the GUI. Change them
# if you want to modify the widgets in which text is displayed.
FontSize = 14 # Size of the font used for all widgets
InputWidgetWidth = 60 # Width of the search text and regex widgets (chars)
SearchTextDepth = 15 # Depth of the search text widget (lines)
MatchesWidth = 25 # Width of the matches found widget (chars)
MatchesDepth = 20 # Depth of the matches found widget (lines)
#
#——————————————————————–#
#—–Main program—————————————————#
#
#
# Import the necessary regular expression function
from re import findall, MULTILINE, DOTALL, finditer
# Import the Tkinter functions
fromtkinter import *
# Create a window
regex_window = Tk()
# Give the window a title
regex_window.title(‘Regular Expression Tester’)
# Create some instructions
search_text_instruction = “Enter the text you want to search here”
regex_instruction = “Enter your regular expression here”
results_instruction = ”’Instructions:
All matches found are displayed here.
Quotation marks are used to mark the beginning and end of each match – they are not part of the string returned.
Matches are displayed in the order found.
The matches are hyperlinks – click on them to see where the match occurs in the text.
Newline or tab characters in the match are shown as \\n and \\t.
Carriage returns (\\r) are deleted from the search text before searching.
If ‘multiline’ is enabled the beginning and end of individual lines can be matched as ^ and $, respectively, otherwise the entire text is treated as a single string containing embedded newlines.
If ‘dotall’ is enabled then a ‘.’ in a regular expression can match a newline character, otherwise ‘.’ does not match newlines.
If the pattern contains more than one group, each match shows all groups.
Note that you may not be able to directly copy-and-paste your regex into a Python script if it contains quote marks or other characters that are meaningful to Python – in this case you need to “escape” the special characters.”’
classHyperlinkManager:
# Adapted from
# http://effbot.org/zone/tkinter-text-hyperlink.htm
# Constructor takes the Text widget to attach to as an argument
def __init__(self, text):
self.text = text
# Formatting for the link tags
self.text.tag_config(“hyper”, foreground=”blue”)
# Bind the mouse events to the functions (defined below)
self.text.tag_bind(“hyper”, “<Enter>”, self._enter)
self.text.tag_bind(“hyper”, “<Leave>”, self._leave)
self.text.tag_bind(“hyper”, “<Button-1>”, self._click)
# Create the empty dictionaries
self.reset()
# Clean up function
def reset(self):
self.starts = {}
self.ends = {}
# Save the positions in the search text to jump to
# returns tags to use in associated text widget
def add(self, start, end):
tag = “hyper-%d” % len(self.starts)
self.starts[tag] = start
self.ends[tag] = end
# Return two tags, one just ‘hyper’, and one with the number after
# first tag is for the formatting, second is for deciding which action to take
return “hyper”, tag
# What happens when you mouse over the link text
def _enter(self, event):
self.text.config(cursor=”hand2″)
# What happens when the mouse leaves the link text
def _leave(self, event):
self.text.config(cursor=””)
# What happens when you click the text
def _click(self, event):
for tag in self.text.tag_names(CURRENT):
if tag[:6] == “hyper-“:
start = self.starts[tag]
end = self.ends[tag]
# Unhighlight any currently highlit text
search_text.tag_remove(‘highlight’, ‘1.0’, ‘end’)
# Move the marks to the section to be highlighted
search_text.mark_set(“matchStart”, start)
search_text.mark_set(“matchEnd”, end)
# Highlight the section
search_text.tag_add(“highlight”, “matchStart”, “matchEnd”)
# Scroll down enough to see the end of the match, then up to see the start
# this is to try to get as much of the match as possible in view,
# but make sure the start of the match is visible
search_text.see(end)
search_text.see(start)
# Should only be one link tag per bit of text
return
# Define the fonts we want to use, including a
# fixed-width one which makes all characters easy to see
fixed_font = (‘Courier’, FontSize)
label_font = (‘Calisto’, FontSize, ‘bold’)
ghost_font_colour = “#888888”
regular_font_colour = “#000000″
# Create a text editing widget for the text to be searched
search_text = Text(regex_window, width = InputWidgetWidth,
height = SearchTextDepth, wrap = WORD,
bg = ‘light grey’, font = fixed_font,
borderwidth = 2, relief = ‘groove’,
takefocus = False)
search_text.insert(END, search_text_instruction)
search_text.grid(row = 1, column = 0, padx = 5)
search_text.configure(foreground=ghost_font_colour)
search_text.tag_configure(‘highlight’, background=”#FF0000”)
# Clear the search text entry widget first time it is clicked
search_text.pristine = True
defsearch_text_click(event):
ifsearch_text.pristine:
search_text.delete(0.0,END)
search_text.pristine = False
search_text.configure(foreground=regular_font_colour)
search_text.bind(“<Button-1>”, search_text_click)
# Create label widgets to describe the boxes
matches_found = Label(regex_window, text = ‘Matches found:’,
font = label_font)
matches_found.grid(row = 0, column = 1, sticky = W, padx = 5)
enter_regex = Label(regex_window, text = ‘Regular expression:’,
font = label_font)
enter_regex.grid(row = 2, column = 0, sticky = W, padx = 5)
text_to_search = Label(regex_window, text = ‘Text to be searched:’,
font = label_font)
text_to_search.grid(row = 0, column = 0, sticky = W, padx = 5)
# Create a text widget to display the matches found
results_text = Text(regex_window, font = fixed_font,
width = MatchesWidth, height = MatchesDepth,
wrap = WORD, bg = ‘light green’,
borderwidth = 2, relief = ‘groove’,
takefocus = False)
results_text.insert(END, results_instruction)
results_text.grid(row = 1, column = 1, rowspan = 4, padx = 5, sticky = N)
results_hyperlink = HyperlinkManager(results_text)
# Create a frame to hold the controls
controls = Frame(regex_window)
controls.grid(row = 4, column = 0, padx = 5, pady = 5)
# Create a checkbutton to allow the user to enable multiline mode
multiline_on = BooleanVar()
multi_button = Checkbutton(controls, text = “Multiline”, font = label_font,
variable = multiline_on, takefocus = False)
multi_button.grid(row = 0, column = 1, padx = 5)
# Create a checkbutton to allow the user to enable dotall mode
dotall_on = BooleanVar()
dotall_button = Checkbutton(controls, text = “Dotall”, font = label_font,
variable = dotall_on, takefocus = False)
dotall_button.grid(row = 0, column = 2, padx = 5)
# Create a text editing widget for the regular expression
reg_exp = Entry(regex_window, font = fixed_font,
width = InputWidgetWidth, bg = ‘light yellow’)
reg_exp.insert(END, regex_instruction)
reg_exp.grid(row = 3, column = 0, sticky = E, padx = 5)
reg_exp.selection_range(0, END) # select all text if we “tab” into the widget
reg_exp.configure(foreground=ghost_font_colour)
# Clear the Regular Expression entry widget first time it is clicked
reg_exp.pristine = True
defreg_exp_click(event):
ifreg_exp.pristine:
reg_exp.delete(0,END)
reg_exp.pristine = False
reg_exp.configure(foreground=regular_font_colour)
reg_exp.bind(“<Button-1>”, reg_exp_click)
# Function to format a single match. This is made more complicated
# than we’d like because Python’s findall function usually returns a list
# of matching strings, but if the regular expression contains more than
# one group then it returns a list of tuples where each tuple contains
# the individual matches for each group.
defformat_match(result):
if type(result) is tuple:
formatted = ()
for match in result:
# make the match a “normal” string (not unicode)
# match = match.encode(‘utf8’)
# make newline and tab characters in the match visible
match = match.replace(‘\n’, ‘\\n’)
match = match.replace(‘\t’, ‘\\t’)
# put it in the resulting tuple
formatted = formatted + (match,)
else:
# get rid of any unicode characters in the result
# result = result.encode(‘utf8’)
# make newline and tab characters in the result visible
formatted = result.replace(‘\n’, ‘\\n’)
formatted = formatted.replace(‘\t’, ‘\\t’)
# put quotes around the result, to help us see empty
# results or results containing spaces at either end
formatted = “‘” + formatted + “‘”
# return either form as a printable string
returnstr(formatted)
# Function to find and display results. This version has
# been made robust to user error, through the use of
# exception handling (a topic we’ll cover later).
# The optional ‘event’ parameter allows this function to be
# the target of a key binding.
deffind_matches(event = None):
# Clear the highlight tag
search_text.tag_remove(‘highlight’, ‘1.0’, ‘end’)
# Remove the hyperlinks
results_hyperlink.reset()
# Clear the results box
results_text.delete(0.0, END)
# Delete any carriage returns (\r) in the search text,
# leaving just newlines (\n), to allow for text pasted from
# an environment with different end-of-line conventions
text_to_search = search_text.get(0.0, END)
text_to_search = text_to_search.replace(‘\r’, ”)
search_text.delete(0.0, END)
search_text.insert(0.0, text_to_search)
# Attempt to find the pattern and display the results
try:
# Do a single string or multiline or dotall search,
# depending on whether or not the user has
# enabled multiline or dotall mode
flags = 0
ifmultiline_on.get():
flags = flags | MULTILINE
ifdotall_on.get():
flags = flags | DOTALL
# Perform the search
results = finditer(reg_exp.get(), text_to_search, flags = flags)
# Display the outcome
results_text[‘bg’] = ‘light green’
# Ifitem_num is still -1 after the loop, there were no results
item_num = -1
foritem_num, match in enumerate(results):
# Get the string result from the result
result = match.group(0)
# Get the index of the start and end of the match in the search text box
start_index = search_text.index(‘1.0+%dc’%match.start())
end_index = search_text.index(‘1.0+%dc’%match.end())
iflen(match.groups()) == 0:
result = format_match(result)
eliflen(match.groups()) == 1:
result = format_match(match.groups()[0])
else:
result = format_match(match.groups())
# Insert the result with the hyperlink
results_text.insert(END, result, results_hyperlink.add(start_index,end_index))
# Add the newline separately so the hyperlink doesn’t apply to the whitespace on the right
results_text.insert(END, “\n”)
# This condition is True if no results were returned
ifitem_num == -1:
results_text[‘bg’] = ‘khaki’
results_text.insert(END, ‘No matches found\n’)
# If anything goes wrong tell the user and assume the failure was due to
# a malformed regular expression
except Exception as exception_type:
results_text[‘bg’] = ‘coral’
results_text.insert(END, ‘Invalid regular expression:\n’ + str(exception_type))
# Create a button widget to start the search
search_button = Button(controls, text = ‘Show matches’,
takefocus = False, command = find_matches,
font = label_font)
search_button.grid(row = 0, column = 0)
# Also allow users to start the search by typing a carriage return
# in the regular expression field
reg_exp.bind(‘<Return>’, find_matches)
# Start the event loop
regex_window.mainloop()
#
#——————————————————————–#
news_archivist.py
#—–Task Description———————————————–#
#
# News Archivist
#
# In this task you will combine your knowledge of HTMl/XML mark-up
# languages with your skills in Python scripting, pattern matching
# and Graphical User Interface development to produce a useful
# application for maintaining and displaying archived news or
# current affairs stories on a topic of your own choice. See the
# instruction sheet accompanying this file for full details.
#
#——————————————————————–#
#—–Imported Functions———————————————#
#
# Below are various import statements that were used in our sample
# solution. You should be able to complete this assignment using
# these functions only.
# Import the function for opening a web document given its URL.
fromurllib.request import urlopen
# Import the function for finding all occurrences of a pattern
# defined via a regular expression, as well as the “multiline”
# and “dotall” flags.
from re import findall, MULTILINE, DOTALL
# A function for opening an HTML document in your operating
# system’s default web browser. We have called the function
# “webopen” so that it isn’t confused with the “open” function
# for writing/reading local text files.
fromwebbrowser import open as webopen
# An operating system-specific function for getting the current
# working directory/folder. Use this function to create the
# full path name to your HTML document.
fromos import getcwd
# An operating system-specific function for ‘normalising’ a
# path to a file to the path-naming conventions used on this
# computer. Apply this function to the full name of your
# HTML document so that your program will work on any
# operating system.
fromos.path import normpath
# Import the standard Tkinter GUI functions.
fromtkinter import *
# Import the SQLite functions.
from sqlite3 import *
# Import the date and time function.
fromdatetime import datetime
#
#——————————————————————–#
#—–Student’s Solution———————————————#
#
# Put your solution at the end of this file.
#
# Name of the folder containing your archived web documents. When
# you submit your solution you must include the web archive along with
# this Python program. The archive must contain one week’s worth of
# downloaded HTML/XML documents. It must NOT include any other files,
# especially image files.
internet_archive = ‘InternetArchive’
################ PUT YOUR SOLUTION HERE #################
pass
downloader.py
#———————————————————–
#
# Web Document Downloader
#
# This simple program is a stand-alone tool to download
# and save the source code of a given web document. For a
# particular URL, it downloads the corresponding web
# document as a Unicode character string and saves it to
# a file. NB: This script assumes the source file is
# encoded as UTF-8.
#
# Q: Why not just access the web page’s source code via
# favourite web browser (Firefox, Google Chrome, etc)?
#
# A: Because when a Python script requests a web document
# from an online server it may not receive the same file
# you see in your browser! Many web servers generate
# different HTML/XML code for different clients.
#
# Worse, some web servers may refuse to send documents to
# programs other than standard web browsers. If a Python
# script requests a web document they may instead respond
# with an “access denied” document! In this case you’ll
# just have to try another web page.
#
# Put your web page address here
url = ‘http://www.wikipedia.org/’ # this web site is nice and doesn’t block access
# url = ‘http://www.wayofcats.com/blog/’ # this web site is nasty and blocks access by Python scripts
# Import the function for opening online documents
fromurllib.request import urlopen
# Open the web document for reading
web_page = urlopen(url)
# Read its contents as a Unicode string
web_page_contents = web_page.read().decode(‘UTF-8’)
# Write the contents to a text file (overwriting the file if it
# already exists!)
html_file = open(‘download.html’, ‘w’, encoding = ‘UTF-8’)
html_file.write(web_page_contents)
html_file.close()
Solution
news_archivist.py
#—–Task Description———————————————–#
#
# News Archivist
#
# In this task you will combine your knowledge of HTMl/XML mark-up
# languages with your skills in Python scripting, pattern matching
# and Graphical User Interface development to produce a useful
# application for maintaining and displaying archived news or
# current affairs stories on a topic of your own choice. See the
# instruction sheet accompanying this file for full details.
#
#——————————————————————–#
#—–Imported Functions———————————————#
#
# Below are various import statements that were used in our sample
# solution. You should be able to complete this assignment using
# these functions only.
# Import the function for opening a web document given its URL.
fromurllib.request import urlopen
# Import the function for finding all occurrences of a pattern
# defined via a regular expression, as well as the “multiline”
# and “dotall” flags.
from re import findall, MULTILINE, DOTALL
# A function for opening an HTML document in your operating
# system’s default web browser. We have called the function
# “webopen” so that it isn’t confused with the “open” function
# for writing/reading local text files.
fromwebbrowser import open as webopen
# An operating system-specific function for getting the current
# working directory/folder. Use this function to create the
# full path name to your HTML document.
fromos import getcwd
# An operating system-specific function for ‘normalising’ a
# path to a file to the path-naming conventions used on this
# computer. Apply this function to the full name of your
# HTML document so that your program will work on any
# operating system.
fromos.path import normpath
# Import the standard Tkinter GUI functions.
fromtkinter import *
# Import the SQLite functions.
from sqlite3 import *
# Import the date and time function.
fromdatetime import datetime
#
#——————————————————————–#
# RSS feed to parse
src_url = ‘https://www.wired.com/feed/’
# GUI logo
logo = ‘http://www.x-architects.com/sites/default/files/wired%20logo.gif’
# the name of generated html document
document_name = ‘document.html’
defparse_story(text, number):
link =re.findall(r'<link>(.*?)</link>’, text, DOTALL)[0].strip()
title = re.findall(r'<title>(.*?)</title>’, text, DOTALL)[0].strip()
image = re.findall(r'<media:thumbnailurl=”(.*?)” .*/>’, text, DOTALL)[0].strip()
summary = re.findall(r'<description>(.*?)</description>’, text, DOTALL)[0].strip()
pubdate = re.findall(r'<pubDate>(.*?)</pubDate>’, text, DOTALL)[0].strip()
return (number, link, title, image, summary, pubdate)
defget_stories(text):
items = re.findall(r”\<item>(.*?)\</item>”, text, DOTALL)
stories = []
# parse top 10 stories in the text
for i in range(10):
story = parse_story(items[i], i + 1)
stories.append(story)
return stories
defstory_to_html(story):
number, link, title, image, summary, pubdate = story
text = “””
<hr/>
<article>
<h2>{0}. {2}</h2>
<imgsrc=”{3}” alt=”Sorry, image {3} not found.”/>
<div>{4}</div>
<p>Full Story: <a href=”{1}”>{1}</a></p>
<p>Dateline: {5}</p>
</article>
“””.format(*story)
return text
defmake_document(document_name, stories, date):
stories_html = “”.join(story_to_html(story) for story in stories)
text = “””<!DOCTYPE html>
<html lang=”en”>
<head>
<meta charset=”utf-8″>
<meta name=”viewport” content=”width=device-width, initial-scale=1.0″>
<title>WIRED News Archive</title>
<style>
html { background-color: #bdbdbd; font-family: sans, sans-serif;}
body { margin: auto; padding: 1em; width: 75%; background-color: #f0f0f0;}
header {text-align: center; padding: 1em 2em;}
headerimg { width: 80%;}
header p { text-align: left;}
article { text-align: center; padding: 1em 3em;}
articleimg { max-height: 40%; max-width: 70%; border: 1px solid black;}
article p { text-align: left;}
</style>
</head>
<body>
<header>
<h1>WIRED News Archive</h1>
<h3>””” + date + “””</h3>
<img src=”http://nicolesharp.com/wp-content/uploads/2017/04/Wired_logo.png” alt=”wired logo” />
<p>News source: <a href=”https://www.wired.com/feed/”>https://www.wired.com/feed/</a></p>
<p>Archivst: Grimly Feendish</p>
</header>
“”” + stories_html + “””
</body>
</html>
“””
# write output
html_file = open(document_name, ‘w’, encoding=’UTF-8′)
html_file.write(text)
html_file.close()
conn = connect(‘event_log.db’)
# Create a window
window = Tk()
# window parameters
window.title(‘Wired Old News Archive’)
window.config(bg=’white’)
window.geometry(‘650×300′)
frame_left = Frame(window, padx=10, pady=10, bg=’white’)
frame_left.pack(side=LEFT)
# app logo and label
photo = PhotoImage(file=”logo.gif”)
w = Label(frame_left, image=photo)
w.photo = photo
w.pack(side=TOP)
Label(frame_left, text=’WIRED’, bg=’white’, font=”sans 30 bold”).pack(side=TOP)
Label(frame_left, text=’News Archive’, bg=’white’, font=”sans 16″).pack(side=TOP)
frame_right = Frame(window, padx=10, pady=10, bg=’white’)
frame_right.pack(side=LEFT, fill=X)
frame_top = Frame(frame_right, padx=10, pady=20, bg=’white’)
frame_top.pack(side=TOP)
# status label
status = StringVar()
statuslabel = Label(frame_top, textvariable=status, bg=’white’, font=”sans 12″)
statuslabel.pack()
status.set(“Please choose a date …”)
frame_bottom = Frame(frame_right, bg=’white’)
frame_bottom.pack(side=TOP)
frame_left2 = Frame(frame_bottom, bg=’white’)
frame_left2.pack(side=LEFT)
logging = IntVar()
listbox = Listbox(frame_left2, selectmode=SINGLE)
listbox.pack()
archive_dates = [‘Wed, 18 Oct 2017’, ‘Thu, 19 Oct 2017’, ‘Fri, 20 Oct 2017’, ‘Sat, 21 Oct 2017’,
‘Sun, 22 Oct 2017’, ‘Mon, 22 Oct 2017’, ‘Tue, 23 Oct 2017’, ‘Latest’]
for item in archive_dates:
listbox.insert(END, item)
deflog_event(descr):
c = conn.cursor()
c.execute(‘insert into Event_Log (Description) values (?)’, (descr, ))
conn.commit()
defarchive_latest():
status.set(“Downloading Latest News …”)
# Open the web document for reading
web_page = urlopen(src_url)
# Read its contents as a Unicode string
web_page_contents = web_page.read().decode(‘UTF-8’)
# Write the contents to a text file (overwriting the file if it
# already exists!)
#output_name = datetime.now().strftime(‘%a, %d %b %Y’) + ‘.xml’
output_name = ‘Latest.xml’
output_path = normpath(internet_archive + ‘/’ + output_name)
html_file = open(output_path, ‘w’, encoding=’UTF-8′)
html_file.write(web_page_contents)
html_file.close()
status.set(“Done”)
iflogging.get() == 1:
descr = “Latest News downloaded and stored in archive”
log_event(descr)
defextract_news():
items = [archive_dates[int(item)] for item in listbox.curselection()]
if not items:
status.set(“No date selected!”)
else:
try: # make sure the archive exists
status.set(“Extracting news”)
# read file to a string
file_name = normpath(internet_archive + ‘/’ + items[0] + ‘.xml’)
html_file = open(file_name, ‘r’, encoding=’UTF-8′)
text = html_file.read()
html_file.close()
stories = get_stories(text)
# conver to html doc
make_document(document_name, stories, items[0])
status.set(“News extracted from archive”)
iflogging.get() == 1:
descr = “News extracted from archive”
log_event(descr)
exceptIOError:
status.set(“News file not found!”)
defdisplay_extracted():
# open in the default browser
path = getcwd() + ‘/’ + document_name
webopen(normpath(path))
iflogging.get() == 1:
descr = “Extracted news displayed in web browser”
log_event(descr)
deftoggle_logging():
iflogging.get() == 1:
descr = “Event logging switched on”
else:
descr = “Event logging switched off”
log_event(descr)
frame_right2 = Frame(frame_bottom, padx=20, bg=’white’)
frame_right2.pack(side=LEFT)
Button(frame_right2, text=’Extract News from Archive’, bg=’white’, command=extract_news).pack(fill=X)
Button(frame_right2, text=’Archive the Latest News’, bg=’white’, command=archive_latest).pack(fill=X, pady=5)
Button(frame_right2, text=’Display News Extracted’, bg=’white’, command=display_extracted).pack(fill=X)
Checkbutton(frame_right2, text=”Log Events”, bg=’white’, variable=logging, pady=5, command=toggle_logging).pack(fill=X)
window.mainloop()
conn.close()