Internet Archive Lite

Assignment

Motivation

The Internet gives us access to a vast amount of data in digital form, but one of its disadvantages
is that this data is ephemeral. Web sites appear, update their contents, and disappear at bewildering speed. While we can still read books printed hundreds of years ago, or even a papyruswritten thousands of years in the past, we often fail to find a web site or online article from justa few years ago!
One prominent attempt to preserve digital data is the Internet Archive, a vast repository of
web pages downloaded and stored over a long period of time. Viathe Internet Archive’s Wayback Machine interface you can retrieve archived copies of web pages from specific dates in thepast.

In this assignment you will build your own Internet Archive-like IT system. Your system will give its user the ability toview news or current affairs articles that have been stored in apermanent archive, as well as the current news. It will have agraphical interface that allows the user to select dates from thearchive, after which the relevant news articles can be extractedand viewed in a standard web browser. It will also allow the userto access the latest data from an online “feed”.

Goal

For the purposes of this assignment you need to choose a regularly-updated source of online

news or current affairs. You have an entirely free choice of the news category, which could
be:
• world news,
• politics,
• fashion and lifestyle,
• arts and entertainment,
• sports,
• business and finance,
• science and technology,
• etc.

Whatever category you choose, you must be able to find an online web site which contains at
least ten current stories or articles on the topic at all times. The web site must be updated
regularly, typically at least once a day. Each of its articles must contain a heading, a short
synopsis, a (link to a) photograph, a link to a full description of the story, and a publication
date.

Specific requirements

To complete this task you are required to produce an application in Python similar to that above,
using the provided news_archivist.py template file as your starting point.

It is suggested that you use the following development process:
1. Find a suitable source of online news articles. Keep in mind that the web site you
choose must be updated daily, must have at least ten articles online at any time, and
each article must have a heading, a short synopsis, a (link to a) photograph, a link to a
full description of the story, and a publication date. Some starting points for finding
such sites can be found in Appendix A below.
2. Create your “Internet Archive” by downloading seven copies of the web site, one per
day.
3. Study the HTML/XML source code of the archived documents to determine how the
elements you want to extract are marked up. Typically you have to identify the
markup tags, and perhaps other unchanging parts of the document that uniquely identify the beginning and end of the text and image addresses you want to extract.
4. Using the provided regex_tester.py application, devise regular expressions
which extract just the necessary elements from the relevant parts of the archived web
documents.
5. You can now develop a simple prototype of your “back end” function(s) that just extracts and saves the required elements from an archived web document.
6. Design the HTML source code for your “extracted” news stories, with appropriate
placeholders for the downloaded web elements you will insert.
7. Develop the necessary Python code to extract the HTML elements from an archived
document and create the HTML file. This completes the major “back end” part of your
solution.
8. Develop a function to download the “latest” news and save it in the archive.
9. Develop a function to open a given HTML document in the host computer’s default
web browser.
10. Add the Graphical User Interface “front end” to your program. Decide whether you
want to use push buttons, radio buttons, menus, lists or some other mechanism for
choosing, extracting and displaying archived news.

 regex_tester.py

 #—–Description—————————————————-#

#

#

#  This program provides a simple Graphical User Interface that

#  helps you develop regular expressions.  It allows you to enter a

#  block of text and a regular expression and see what matches

#  are found.  (Similar web browser-based tools can be found online,

#  but the advantage of this one is that it’s written in Python, so

#  we know for certain that it obeys Python’s regular expression

#  syntax.)

#

#

#——————————————————————–#

#—–Useful constants———————————————–#

#

#  These constants control the text widgets in the GUI.  Change them

#  if you want to modify the widgets in which text is displayed.

FontSize = 14 # Size of the font used for all widgets

InputWidgetWidth = 60 # Width of the search text and regex widgets (chars)

SearchTextDepth = 15 # Depth of the search text widget (lines)

MatchesWidth = 25 # Width of the matches found widget (chars)

MatchesDepth = 20 # Depth of the matches found widget (lines)

#

#——————————————————————–#

#—–Main program—————————————————#

#

#

# Import the necessary regular expression function

from re import findall, MULTILINE, DOTALL, finditer

# Import the Tkinter functions

fromtkinter import *

# Create a window

regex_window = Tk()

# Give the window a title

regex_window.title(‘Regular Expression Tester’)

# Create some instructions

search_text_instruction = “Enter the text you want to search here”

regex_instruction = “Enter your regular expression here”

results_instruction = ”’Instructions:

All matches found are displayed here.

Quotation marks are used to mark the beginning and end of each match – they are not part of the string returned.

Matches are displayed in the order found.

The matches are hyperlinks – click on them to see where the match occurs in the text.

Newline or tab characters in the match are shown as \\n and \\t.

Carriage returns (\\r) are deleted from the search text before searching.

If ‘multiline’ is enabled the beginning and end of individual lines can be matched as ^ and $, respectively, otherwise the entire text is treated as a single string containing embedded newlines.

If ‘dotall’ is enabled then a ‘.’ in a regular expression can match a newline character, otherwise ‘.’ does not match newlines.

If the pattern contains more than one group, each match shows all groups.

Note that you may not be able to directly copy-and-paste your regex into a Python script if it contains quote marks or other characters that are meaningful to Python – in this case you need to “escape” the special characters.”’

classHyperlinkManager:

# Adapted from

# http://effbot.org/zone/tkinter-text-hyperlink.htm

# Constructor takes the Text widget to attach to as an argument

def __init__(self, text):

self.text = text

# Formatting for the link tags

self.text.tag_config(“hyper”, foreground=”blue”)

# Bind the mouse events to the functions (defined below)

self.text.tag_bind(“hyper”, “<Enter>”, self._enter)

self.text.tag_bind(“hyper”, “<Leave>”, self._leave)

self.text.tag_bind(“hyper”, “<Button-1>”, self._click)

# Create the empty dictionaries

self.reset()

# Clean up function

def reset(self):

self.starts = {}

self.ends = {}

# Save the positions in the search text to jump to

# returns tags to use in associated text widget

def add(self, start, end):

tag = “hyper-%d” % len(self.starts)

self.starts[tag] = start

self.ends[tag] = end

# Return two tags, one just ‘hyper’, and one with the number after

# first tag is for the formatting, second is for deciding which action to take

return “hyper”, tag

# What happens when you mouse over the link text

def _enter(self, event):

self.text.config(cursor=”hand2″)

# What happens when the mouse leaves the link text

def _leave(self, event):

self.text.config(cursor=””)

# What happens when you click the text

def _click(self, event):

for tag in self.text.tag_names(CURRENT):

if tag[:6] == “hyper-“:

start = self.starts[tag]

end = self.ends[tag]

# Unhighlight any currently highlit text

search_text.tag_remove(‘highlight’, ‘1.0’, ‘end’)

# Move the marks to the section to be highlighted

search_text.mark_set(“matchStart”, start)

search_text.mark_set(“matchEnd”, end)

# Highlight the section

search_text.tag_add(“highlight”, “matchStart”, “matchEnd”)

# Scroll down enough to see the end of the match, then up to see the start

# this is to try to get as much of the match as possible in view,

# but make sure the start of the match is visible

search_text.see(end)

search_text.see(start)

# Should only be one link tag per bit of text

return

# Define the fonts we want to use, including a

# fixed-width one which makes all characters easy to see

fixed_font = (‘Courier’, FontSize)

label_font = (‘Calisto’, FontSize, ‘bold’)

ghost_font_colour = “#888888”

regular_font_colour = “#000000″

# Create a text editing widget for the text to be searched

search_text = Text(regex_window, width = InputWidgetWidth,

height = SearchTextDepth, wrap = WORD,

bg = ‘light grey’, font = fixed_font,

borderwidth = 2, relief = ‘groove’,

takefocus = False)

search_text.insert(END, search_text_instruction)

search_text.grid(row = 1, column = 0, padx = 5)

search_text.configure(foreground=ghost_font_colour)

search_text.tag_configure(‘highlight’, background=”#FF0000”)

# Clear the search text entry widget first time it is clicked

search_text.pristine = True

defsearch_text_click(event):

ifsearch_text.pristine:

search_text.delete(0.0,END)

search_text.pristine = False

search_text.configure(foreground=regular_font_colour)

search_text.bind(“<Button-1>”, search_text_click)

# Create label widgets to describe the boxes

matches_found = Label(regex_window, text = ‘Matches found:’,

font = label_font)

matches_found.grid(row = 0, column = 1, sticky = W, padx = 5)

enter_regex = Label(regex_window, text = ‘Regular expression:’,

font = label_font)

enter_regex.grid(row = 2, column = 0, sticky = W, padx = 5)

text_to_search = Label(regex_window, text = ‘Text to be searched:’,

font = label_font)

text_to_search.grid(row = 0, column = 0, sticky = W, padx = 5)

# Create a text widget to display the matches found

results_text = Text(regex_window, font = fixed_font,

width = MatchesWidth, height = MatchesDepth,

wrap = WORD, bg = ‘light green’,

borderwidth = 2, relief = ‘groove’,

takefocus = False)

results_text.insert(END, results_instruction)

results_text.grid(row = 1, column = 1, rowspan = 4, padx = 5, sticky = N)

results_hyperlink = HyperlinkManager(results_text)

# Create a frame to hold the controls

controls = Frame(regex_window)

controls.grid(row = 4, column = 0, padx = 5, pady = 5)

# Create a checkbutton to allow the user to enable multiline mode

multiline_on = BooleanVar()

multi_button = Checkbutton(controls, text = “Multiline”, font = label_font,

variable = multiline_on, takefocus = False)

multi_button.grid(row = 0, column = 1, padx = 5)

# Create a checkbutton to allow the user to enable dotall mode

dotall_on = BooleanVar()

dotall_button = Checkbutton(controls, text = “Dotall”, font = label_font,

variable = dotall_on, takefocus = False)

dotall_button.grid(row = 0, column = 2, padx = 5)

# Create a text editing widget for the regular expression

reg_exp = Entry(regex_window, font = fixed_font,

width = InputWidgetWidth, bg = ‘light yellow’)

reg_exp.insert(END, regex_instruction)

reg_exp.grid(row = 3, column = 0, sticky = E, padx = 5)

reg_exp.selection_range(0, END) # select all text if we “tab” into the widget

reg_exp.configure(foreground=ghost_font_colour)

# Clear the Regular Expression entry widget first time it is clicked

reg_exp.pristine = True

defreg_exp_click(event):

ifreg_exp.pristine:

reg_exp.delete(0,END)

reg_exp.pristine = False

reg_exp.configure(foreground=regular_font_colour)

reg_exp.bind(“<Button-1>”, reg_exp_click)

# Function to format a single match.  This is made more complicated

# than we’d like because Python’s findall function usually returns a list

# of matching strings, but if the regular expression contains more than

# one group then it returns a list of tuples where each tuple contains

# the individual matches for each group.

defformat_match(result):

if type(result) is tuple:

formatted = ()

for match in result:

# make the match a “normal” string (not unicode)

# match = match.encode(‘utf8’)

# make newline and tab characters in the match visible

match = match.replace(‘\n’, ‘\\n’)

match = match.replace(‘\t’, ‘\\t’)

# put it in the resulting tuple

formatted = formatted + (match,)

else:

# get rid of any unicode characters in the result

# result = result.encode(‘utf8’)

# make newline and tab characters in the result visible

formatted = result.replace(‘\n’, ‘\\n’)

formatted = formatted.replace(‘\t’, ‘\\t’)

# put quotes around the result, to help us see empty

# results or results containing spaces at either end

formatted = “‘” + formatted + “‘”

# return either form as a printable string

returnstr(formatted)

# Function to find and display results.  This version has

# been made robust to user error, through the use of

# exception handling (a topic we’ll cover later).

# The optional ‘event’ parameter allows this function to be

# the target of a key binding.

deffind_matches(event = None):

# Clear the highlight tag

search_text.tag_remove(‘highlight’, ‘1.0’, ‘end’)

# Remove the hyperlinks

results_hyperlink.reset()

# Clear the results box

results_text.delete(0.0, END)

# Delete any carriage returns (\r) in the search text,

# leaving just newlines (\n), to allow for text pasted from

# an environment with different end-of-line conventions

text_to_search = search_text.get(0.0, END)

text_to_search = text_to_search.replace(‘\r’, ”)

search_text.delete(0.0, END)

search_text.insert(0.0, text_to_search)

# Attempt to find the pattern and display the results

try:

# Do a single string or multiline or dotall search,

# depending on whether or not the user has

# enabled multiline or dotall mode

flags = 0

ifmultiline_on.get():

flags = flags | MULTILINE

ifdotall_on.get():

flags = flags | DOTALL

# Perform the search

results = finditer(reg_exp.get(), text_to_search, flags = flags)

# Display the outcome

results_text[‘bg’] = ‘light green’

# Ifitem_num is still -1 after the loop, there were no results

item_num = -1

foritem_num, match in enumerate(results):

# Get the string result from the result

result = match.group(0)

# Get the index of the start and end of the match in the search text box

start_index = search_text.index(‘1.0+%dc’%match.start())

end_index = search_text.index(‘1.0+%dc’%match.end())

iflen(match.groups()) == 0:

result = format_match(result)

eliflen(match.groups()) == 1:

result = format_match(match.groups()[0])

else:

result = format_match(match.groups())

# Insert the result with the hyperlink

results_text.insert(END, result, results_hyperlink.add(start_index,end_index))

# Add the newline separately so the hyperlink doesn’t apply to the whitespace on the right

results_text.insert(END, “\n”)

# This condition is True if no results were returned

ifitem_num == -1:

results_text[‘bg’] = ‘khaki’

results_text.insert(END, ‘No matches found\n’)

# If anything goes wrong tell the user and assume the failure was due to

# a malformed regular expression

except Exception as exception_type:

results_text[‘bg’] = ‘coral’

results_text.insert(END, ‘Invalid regular expression:\n’ +  str(exception_type))

# Create a button widget to start the search

search_button = Button(controls, text = ‘Show matches’,

takefocus = False, command = find_matches,

font = label_font)

search_button.grid(row = 0, column = 0)

# Also allow users to start the search by typing a carriage return

# in the regular expression field

reg_exp.bind(‘<Return>’, find_matches)

# Start the event loop

regex_window.mainloop()

#

#——————————————————————–# 

news_archivist.py

#—–Task Description———————————————–#

#

#  News Archivist

#

#  In this task you will combine your knowledge of HTMl/XML mark-up

#  languages with your skills in Python scripting, pattern matching

#  and Graphical User Interface development to produce a useful

#  application for maintaining and displaying archived news or

#  current affairs stories on a topic of your own choice.  See the

#  instruction sheet accompanying this file for full details.

#

#——————————————————————–#

#—–Imported Functions———————————————#

#

# Below are various import statements that were used in our sample

# solution.  You should be able to complete this assignment using

# these functions only.

# Import the function for opening a web document given its URL.

fromurllib.request import urlopen

# Import the function for finding all occurrences of a pattern

# defined via a regular expression, as well as the “multiline”

# and “dotall” flags.

from re import findall, MULTILINE, DOTALL

# A function for opening an HTML document in your operating

# system’s default web browser. We have called the function

# “webopen” so that it isn’t confused with the “open” function

# for writing/reading local text files.

fromwebbrowser import open as webopen

# An operating system-specific function for getting the current

# working directory/folder.  Use this function to create the

# full path name to your HTML document.

fromos import getcwd

# An operating system-specific function for ‘normalising’ a

# path to a file to the path-naming conventions used on this

# computer.  Apply this function to the full name of your

# HTML document so that your program will work on any

# operating system.

fromos.path import normpath

# Import the standard Tkinter GUI functions.

fromtkinter import *

# Import the SQLite functions.

from sqlite3 import *

# Import the date and time function.

fromdatetime import datetime

#

#——————————————————————–#

#—–Student’s Solution———————————————#

#

# Put your solution at the end of this file.

#

# Name of the folder containing your archived web documents.  When

# you submit your solution you must include the web archive along with

# this Python program. The archive must contain one week’s worth of

# downloaded HTML/XML documents. It must NOT include any other files,

# especially image files.

internet_archive = ‘InternetArchive’

################ PUT YOUR SOLUTION HERE #################

pass 

downloader.py 

#———————————————————–

#

# Web Document Downloader

#

# This simple program is a stand-alone tool to download

# and save the source code of a given web document. For a

# particular URL, it downloads the corresponding web

# document as a Unicode character string and saves it to

# a file.  NB: This script assumes the source file is

# encoded as UTF-8.

#

# Q: Why not just access the web page’s source code via

# favourite web browser (Firefox, Google Chrome, etc)?

#

# A: Because when a Python script requests a web document

# from an online server it may not receive the same file

# you see in your browser!  Many web servers generate

# different HTML/XML code for different clients.

#

# Worse, some web servers may refuse to send documents to

# programs other than standard web browsers.  If a Python

# script requests a web document they may instead respond

# with an “access denied” document!  In this case you’ll

# just have to try another web page.

#

# Put your web page address here

url = ‘http://www.wikipedia.org/’ # this web site is nice and doesn’t block access

# url = ‘http://www.wayofcats.com/blog/’ # this web site is nasty and blocks access by Python scripts

# Import the function for opening online documents

fromurllib.request import urlopen

# Open the web document for reading

web_page = urlopen(url)

# Read its contents as a Unicode string

web_page_contents = web_page.read().decode(‘UTF-8’)

# Write the contents to a text file (overwriting the file if it

# already exists!)

html_file = open(‘download.html’, ‘w’, encoding = ‘UTF-8’)

html_file.write(web_page_contents)

html_file.close()

Solution 

news_archivist.py

#—–Task Description———————————————–#

#

#  News Archivist

#

#  In this task you will combine your knowledge of HTMl/XML mark-up

#  languages with your skills in Python scripting, pattern matching

#  and Graphical User Interface development to produce a useful

#  application for maintaining and displaying archived news or

#  current affairs stories on a topic of your own choice.  See the

#  instruction sheet accompanying this file for full details.

#

#——————————————————————–#

#—–Imported Functions———————————————#

#

# Below are various import statements that were used in our sample

# solution.  You should be able to complete this assignment using

# these functions only.

# Import the function for opening a web document given its URL.

fromurllib.request import urlopen

# Import the function for finding all occurrences of a pattern

# defined via a regular expression, as well as the “multiline”

# and “dotall” flags.

from re import findall, MULTILINE, DOTALL

# A function for opening an HTML document in your operating

# system’s default web browser. We have called the function

# “webopen” so that it isn’t confused with the “open” function

# for writing/reading local text files.

fromwebbrowser import open as webopen

# An operating system-specific function for getting the current

# working directory/folder.  Use this function to create the

# full path name to your HTML document.

fromos import getcwd

# An operating system-specific function for ‘normalising’ a

# path to a file to the path-naming conventions used on this

# computer.  Apply this function to the full name of your

# HTML document so that your program will work on any

# operating system.

fromos.path import normpath

# Import the standard Tkinter GUI functions.

fromtkinter import *

# Import the SQLite functions.

from sqlite3 import *

# Import the date and time function.

fromdatetime import datetime

#

#——————————————————————–#

# RSS feed to parse

src_url = ‘https://www.wired.com/feed/’

# GUI logo

logo = ‘http://www.x-architects.com/sites/default/files/wired%20logo.gif’

# the name of generated html document

document_name = ‘document.html’

defparse_story(text, number):

link  =re.findall(r'<link>(.*?)</link>’, text, DOTALL)[0].strip()

title = re.findall(r'<title>(.*?)</title>’, text, DOTALL)[0].strip()

image = re.findall(r'<media:thumbnailurl=”(.*?)” .*/>’, text, DOTALL)[0].strip()

summary = re.findall(r'<description>(.*?)</description>’, text, DOTALL)[0].strip()

pubdate = re.findall(r'<pubDate>(.*?)</pubDate>’, text, DOTALL)[0].strip()

return (number, link, title, image, summary, pubdate)

defget_stories(text):

items = re.findall(r”\<item>(.*?)\</item>”, text, DOTALL)

stories = []

# parse top 10 stories in the text

for i in range(10):

story = parse_story(items[i], i + 1)

stories.append(story)

return stories

defstory_to_html(story):

number, link, title, image, summary, pubdate = story

text = “””

<hr/>

<article>

<h2>{0}. {2}</h2>

<imgsrc=”{3}” alt=”Sorry, image {3} not found.”/>

<div>{4}</div>

<p>Full Story: <a href=”{1}”>{1}</a></p>

<p>Dateline: {5}</p>

</article>

“””.format(*story)

return text

defmake_document(document_name, stories, date):

stories_html = “”.join(story_to_html(story) for story in stories)

text = “””<!DOCTYPE html>

<html lang=”en”>

<head>

<meta charset=”utf-8″>

<meta name=”viewport” content=”width=device-width, initial-scale=1.0″>

<title>WIRED News Archive</title>

<style>

html { background-color: #bdbdbd; font-family: sans, sans-serif;}

body { margin: auto; padding: 1em; width: 75%; background-color: #f0f0f0;}

header     {text-align: center; padding: 1em 2em;}

headerimg { width: 80%;}

header p   { text-align: left;}

article     { text-align: center; padding: 1em 3em;}

articleimg { max-height: 40%; max-width: 70%; border: 1px solid black;}

article p   { text-align: left;}

</style>

</head>

<body>

<header>

<h1>WIRED News Archive</h1>

<h3>””” + date + “””</h3>

<img src=”http://nicolesharp.com/wp-content/uploads/2017/04/Wired_logo.png” alt=”wired logo” />

<p>News source: <a href=”https://www.wired.com/feed/”>https://www.wired.com/feed/</a></p>

<p>Archivst: Grimly Feendish</p>

</header>

“”” + stories_html + “””

</body>

</html>

“””

# write output

html_file = open(document_name, ‘w’, encoding=’UTF-8′)

html_file.write(text)

html_file.close()

conn = connect(‘event_log.db’)

# Create a window

window = Tk()

# window parameters

window.title(‘Wired Old News Archive’)

window.config(bg=’white’)

window.geometry(‘650×300′)

frame_left = Frame(window, padx=10, pady=10, bg=’white’)

frame_left.pack(side=LEFT)

# app logo and label

photo = PhotoImage(file=”logo.gif”)

w = Label(frame_left, image=photo)

w.photo = photo

w.pack(side=TOP)

Label(frame_left, text=’WIRED’, bg=’white’, font=”sans 30 bold”).pack(side=TOP)

Label(frame_left, text=’News Archive’, bg=’white’, font=”sans 16″).pack(side=TOP)

frame_right = Frame(window, padx=10, pady=10, bg=’white’)

frame_right.pack(side=LEFT, fill=X)

frame_top = Frame(frame_right, padx=10, pady=20, bg=’white’)

frame_top.pack(side=TOP)

# status label

status = StringVar()

statuslabel = Label(frame_top, textvariable=status, bg=’white’, font=”sans 12″)

statuslabel.pack()

status.set(“Please choose a date …”)

frame_bottom = Frame(frame_right, bg=’white’)

frame_bottom.pack(side=TOP)

frame_left2 = Frame(frame_bottom, bg=’white’)

frame_left2.pack(side=LEFT)

logging = IntVar()

listbox = Listbox(frame_left2, selectmode=SINGLE)

listbox.pack()

archive_dates = [‘Wed, 18 Oct 2017’, ‘Thu, 19 Oct 2017’, ‘Fri, 20 Oct 2017’, ‘Sat, 21 Oct 2017’,

‘Sun, 22 Oct 2017’, ‘Mon, 22 Oct 2017’, ‘Tue, 23 Oct 2017’, ‘Latest’]

for item in archive_dates:

listbox.insert(END, item)

deflog_event(descr):

c = conn.cursor()

c.execute(‘insert into Event_Log (Description) values (?)’, (descr, ))

conn.commit()

defarchive_latest():

status.set(“Downloading Latest News …”)

# Open the web document for reading

web_page = urlopen(src_url)

# Read its contents as a Unicode string

web_page_contents = web_page.read().decode(‘UTF-8’)

# Write the contents to a text file (overwriting the file if it

# already exists!)

#output_name = datetime.now().strftime(‘%a, %d %b %Y’) + ‘.xml’

output_name = ‘Latest.xml’

output_path = normpath(internet_archive + ‘/’ + output_name)

html_file = open(output_path, ‘w’, encoding=’UTF-8′)

html_file.write(web_page_contents)

html_file.close()

status.set(“Done”)

iflogging.get() == 1:

descr = “Latest News downloaded and stored in archive”

log_event(descr)

defextract_news():

items = [archive_dates[int(item)] for item in listbox.curselection()]

if not items:

status.set(“No date selected!”)

else:

try:        # make sure the archive exists

status.set(“Extracting news”)

# read file to a string

file_name = normpath(internet_archive + ‘/’ + items[0] + ‘.xml’)

html_file = open(file_name, ‘r’, encoding=’UTF-8′)

text = html_file.read()

html_file.close()

stories = get_stories(text)

# conver to html doc

make_document(document_name, stories, items[0])

status.set(“News extracted from archive”)

iflogging.get() == 1:

descr = “News extracted from archive”

log_event(descr)

exceptIOError:

status.set(“News file not found!”)

defdisplay_extracted():

# open in the default browser

path = getcwd() + ‘/’ + document_name

webopen(normpath(path))

iflogging.get() == 1:

descr = “Extracted news displayed in web browser”

log_event(descr)

deftoggle_logging():

iflogging.get() == 1:

descr = “Event logging switched on”

else:

descr = “Event logging switched off”

log_event(descr)

frame_right2 = Frame(frame_bottom, padx=20, bg=’white’)

frame_right2.pack(side=LEFT)

Button(frame_right2, text=’Extract News from Archive’, bg=’white’, command=extract_news).pack(fill=X)

Button(frame_right2, text=’Archive the Latest News’, bg=’white’, command=archive_latest).pack(fill=X, pady=5)

Button(frame_right2, text=’Display News Extracted’, bg=’white’, command=display_extracted).pack(fill=X)

Checkbutton(frame_right2, text=”Log Events”, bg=’white’, variable=logging, pady=5, command=toggle_logging).pack(fill=X)

window.mainloop()

conn.close()