Detecting Counterfeit Currency in Python

Detecting Counterfeit Currency in Python Homework Sample

We have a dataset of counterfeit currency and regular currency. There are 5 attributes which are the variance of the wavelet transformed image, the skewness of the wavelet transformed image, the curtosis of the wavelet transformed image, the entropy of the image and the class (0 = counterfeit and 1 = genuine). You will be using training data and test data, and the program should report the accuracy of the classifier. For more Python programming assignments contact us for a quote.

Solution:

data.py

“””
this module performs all the operations we need to get our training and testing data
it has 3 main functions:
1) scraping_data: this function is used to scrap data from the source and return numpy array that contains the data
2) files: this function takes the array with all data that was produced from ‘scraping_data’ function,
splits it into a training set and testing set, then save them into 2 different files
3) read_data: it reads the files that were generated from ‘files()’ function, and returns training data and testing data as numpy arrays
“””

# import modules we will use:
from requests import get # to send GET request to the source website and get the html
from contextlib import closing # to close webpage after reading data from it
import numpy as np # to create arrays from data
from io import BytesIO, StringIO # to convert scraped data into text

def scraping_data(url):
”’
function that takes a url and reads its content, then returns
an array with all data in this page
”’
# to read the web page content then close it:
with closing(get(url, stream=True)) as resp:
raw_text = BytesIO(resp.content).getvalue()

# put the page content into string format:
string = StringIO(raw_text.decode(“utf-8″), ‘\n’)
# make every row in the text as row in array:
raw_data = np.genfromtxt(string, dtype=’str’)
data = []
# split every row by ‘,’ to get the columns and convert the values from strings to floats, then append it into the data array:
for idx in range(raw_data.shape[0]):
data.append(list(map(float, raw_data[idx].split(‘,’))))

return np.array(data)

def files():
”’
function splits data into training set and testing set,
and saves then into two different files
”’

# get the data array using scraping_data function:
data = scraping_data(‘https://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.txt’)

# split data into counterfeit and real:
all_counterfeit = data[np.where(data[:, -1] == 0)]
all_real = data[np.where(data[:, -1] == 1)]

# get the index of the half in counterfeit and real:
half_counterfeit = int(len(all_counterfeit)/2)
half_real = int(len(all_real)/2)

# put half of the real rows and half of the counterfeit rows in the training set, and the other halves
# in the testing set:
training = np.concatenate([all_counterfeit[:half_counterfeit], all_real[:half_real]], axis=0)
testing = np.concatenate([all_counterfeit[half_counterfeit:], all_real[half_real:]], axis=0)

# save data into two seprates files
np.savetxt(“training.txt”, training)
np.savetxt(“testing.txt”, testing)

def read_data():
”’
function reads the generated files from the (files) function
and returns 2 numpy arrays of the training and testing data
”’

# call files function to generate the files we will read:
files()
# read the files as a numpy array using (loadtxt) method:
training_data = np.loadtxt(“training.txt”)
testing_data = np.loadtxt(“testing.txt”)
return training_data, testing_data

if __name__ == “__main__”:
# test files function to create the training and testing files:
files()
# test read_data function and make sure it returns the training and testing arrays:
training_data, testing_data = read_data()
print(training_data.shape, testing_data.shape)

main.py

“””
in this module we put all together
it uses methods from the modules we have implemented before
to build and evaluate the classifier then prints the accuracy
“””

# import methods we will use from modules we’ve implemented:
from data import read_data
from train import training
from test import testing

def main():
# get the training and testing data
train_data, test_data = read_data()
# compute the threshold values
threshold = training(train_data)
# make predictions and compute the accuracy
accuracy = testing(threshold, test_data)

print(f’The accuracy of the classifier = {accuracy}% ‘)

if __name__ == “__main__”:
main()

test.py

“””
this module performs the testing part in the classifier, it has two main functions
1) accuracy: to compute the accuracy of the classifier
2) testing: to make predictions for testing data based on the threshold values
“””

# import modules we will use:
import numpy as np

def accuracy(labels, prediction):
”’
function takes the actual labels and the predicted labels
as parameters, and returns the percentage of the correctely
classified data (the accuracy)
”’
temp = [1 if labels[i] == prediction[i] else 0 for i in range(len(labels))]
accuracy = (sum(temp)/len(labels)) * 100
return accuracy

def testing(threshold, test_data):
”’
function takes testing data and threshold values,
then returns the predicted labels for this data
”’

test = test_data[:, :-1] # take all columns in the test set except the label
predictions = []

# make prediction for every row in the testing data by comparing
# every column with a threshold
for row in test:
temp = []
if row[0] >= threshold[0]:
temp.append(0)
else:
temp.append(1)

if row[1] >= threshold[1]:
temp.append(0)
else:
temp.append(1)

if row[2] <= threshold[2]:
temp.append(1)
else:
temp.append(0)

if row[3] >= threshold[3]:
temp.append(0)
else:
temp.append(1)

# make prediction based on the majority voting
prediction = [1 if temp.count(1) > temp.count(0) else 0]
predictions.append(prediction)

# use (accuracy) function to compute the accuracy of the classifier:
acc = accuracy(test_data[:, -1], np.array(predictions))
return acc

if __name__ == “__main__”:
# testing the testing and accuracy functions with arbitrary array:
data = np.array([[2, 4, 6, 8, 1],
[4, 6, 8, 10, 0],
[1, 3, 5, 7, 1],
[3, 5, 7, 9, 1]])
threshold = [2.5, 4.5, 6.5, 8.5]
acc = testing(threshold, data)
print(acc)

train.py

“””
this module performs the training part in the classifier, it has one main function
1) training: it calculates the threshold values that we will use to classify our data
“””

# import modules we will use:
import numpy as np # to deal with arrays

def training(data):
”’
function takes the training data and returns the threshold
values that will be used in classification
”’
# split data into counterfeit part and real part:
counterfeit = data[np.where(data[:, -1] == 0)][:, :-1]
real = data[np.where(data[:, -1] == 1)][:, :-1]

# calculate the mean values for columns in the counterfeit rows and the real row:
counterfeit_mean = np.mean(counterfeit, axis=0)
real_mean = np.mean(real, axis=0)

# get the threshold values by adding counterfeit mean values and real mean values
# then dividing the result by 2:
threshold = np.add(counterfeit_mean, real_mean) / 2

return threshold

if __name__ == “__main__”:
# testing the training function with arbitrary array:
data = np.array([[2, 4, 6, 8, 0],
[4, 6, 8, 10, 0],
[1, 3, 5, 7, 1],
[3, 5, 7, 9, 1]])
threshold = training(data)
print(threshold)