# (Solved Homework): This is the test class code that I need help with: @Test public void testTask3example()

This is the test class code that I need help with:

@Test
{
MarkovModel model = new MarkovModel(2,”aabcabaacaac”);
ModelMatcher match = new ModelMatcher(model,”aabbcaac”);
assertEquals(0,1); //TODO replace with test code
}
}

The details for this task and code are below:

Task 3 – Matching test strings to a model (ModelMatcher)

For this task, you will need to complete the ModelMatcher class, which determines which of two Markov models better matches a test string. Given two Markov models built from text samples taken from two different sources, we can use them to estimate which source it is more likely a test string was drawn from. Even using only zero-th order models constructed using English and Russian text, we should be able to tell, for instance, that the string

“Борщ безвкусен”

is more likely to be Russian than English.

We will computer a measure of fit of test strings against models, called the likelihood of a sequence under the model. The likelihood of a sequence under a kk-th order Markov model is calculated as follows:

For each symbol c in the sequence, compute the probability of observing c under the model, given its k-letter context p (assuming the sequence to “wrap around”, as described above), using the Laplace-smoothed estimate of probability we used for the MarkovModel class.

Compute the likelihood of the entire sequence as the product of the likelihoods of each character.

In mathematical notation:

Let ss be an input sequence of length nn, and let MM be a kk-th order Markov model. In order to calculate the likelihood of the sequence ss under the model MM, for each symbol cici in ss (where 1≤i≤n1≤i≤n), let pipi be the kk-length context of the symbol cici (assuming wraparound). The likelihood of the sequence ss under the model is

n∏i=1laplace(ci)∏i=1nlaplace(ci)

where laplacelaplace is the Laplace-smoothed probability of cici occurring (given its context) as described in the previous task.

The probability we obtain from this calculation may be very small – in fact, potentially so small as to be indistinguishable from zero when using Java’s built-in floating-point arithmetic. Therefore, we will calculate and express the likelihood using log probabilities, which do not suffer from this problem. (A weakness of log probabilities is that they cannot straightforwardly represent probabilities of zero, but our use of Laplace smoothing allows us to avoid this problem.) The product of two probabilities pp and qq can be calculated by adding their log probabilities, logplog⁡p and logqlog⁡q.

By way of example, suppose we have constructed a 2nd-order Markov model using the input string “aabcabaacaac”, as described in the example for Task 2. If we are then given a test string “aabbcaac”, we can compute its log likelihood as follows:

For each character in the test string, obtain its length-2 context (assuming wraparound). Note that the tri-gram “caa” occurs twice in the (wrapped) test string.

Characters and contexts in test string “aabbcaac”
Context Character Frequency
“aa” ‘b’ 1
“ab” ‘b’ 1
“bb” ‘c’ 1
“bc” ‘a’ 1
“ca” ‘a’ 2
“aa” ‘c’ 1
“ac” ‘a’ 1

For each character in the test string, calculate its Laplace-smoothed probability of occurring (given its context), and then take the log of this.

Laplace estimate and log likelihoods for test string “aabbcaac”
Context Character Laplace estimate log10log10 of laplace
estimate
“ac” ‘a’ 2+12+32+12+3 -0.2218
“aa” ‘c’ 2+13+32+13+3 -0.3010
“bc” ‘a’ 1+11+31+11+3 -0.3010
“aa” ‘b’ 1+13+31+13+3 -0.4771
“bb” ‘c’ 0+10+30+10+3 -0.4771
“ca” ‘a'(2x) 2+13+32+13+3 -0.6021
“ab” ‘b’ 0+12+30+12+3 -0.6990

Then

total log likelihood = sum of log probabilities = -3.0792, and

average log likelihood = sum of log probabilities = -0.3849

If we have two Markov models, and one test string, then we can select the best model as being the one that maximizes the likelihood of the string under the model – i.e. we choose the model with the higher average log likelihood. Notice that the example above lists ngrams from that most in agreement with the model (highest log likelihood) to the least likely. This helps us to explain why a particular test string matches a model or not.

You will need to write code in the ModelMatcher class that takes as input one Markov model and a “test string” for which we want to identify how well it matches the given model. Complete the ModelMatcher class as follows. You may wish to complete sub-task (c) before sub-task (b).

Fill in the @author and @version fields in the header comments with all authors’ names, student numbers and the date.

Complete the constructor, which calculates and sets the fields in the class. Getter methods for these fields are already complete.

Complete the totalLogLikelihood method, a private helper method which sums a map containing log likelihoods for strings, and the averageLogLikelihood helper method, which calculates the average.

Complete the toString method for the ModelMatcher class. This method should give a summary of the results. You may choose how to arrange the string, but it should include at least the name, averageLogLikelihood and a table of the loglikelihood probabilities as shown in the example above.

Ensure you complete Javadoc comments for each method and class including @param, @returns and @throws fields as required.

import java.util.HashMap;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Map;
import java.util.Set;

public class ModelMatcher
{

/** log likelihoods for a teststring under a given model */
private HashMap<String,Double> logLikelihoodMap;
/** summary statistic for this setting */
private double averageLogLikelihood;

/**
* Constructor to initialise the fields for the log likelihood map for
* a test string and a given Markov model and
* the average log likelihood summary statistic
* @param MarkovModel model a given Markov model object
* @param String teststring
*/
public ModelMatcher(MarkovModel model, String testString) {

int modelOrder = model.getK();
this.logLikelihoodMap = new HashMap<String, Double>();

double laplaceEstimate;
double logLikelihood;
String sequence;

NgramAnalyser stringNgram = new NgramAnalyser(modelOrder + 1,testString);

Set<String> distinctNgrams = stringNgram.getDistinctNgrams();

for (String ngram : distinctNgrams) {
laplaceEstimate = model.laplaceEstimate(ngram);
//Use change of base formula to find log(10) likelihood
logLikelihood = Math.log(laplaceEstimate) / Math.log(10);

this.logLikelihoodMap.put(ngram, logLikelihood);
}

this.averageLogLikelihood =
this.averageLogLikelihood(this.logLikelihoodMap, stringNgram.getNgramCount());

}

/** Helper method that calculates the average log likelihood statistic
* given a HashMap of strings and their Laplace probabilities
* and the total number of ngrams in the model.
*
* @param logs map of ngram strings and their log likelihood
* @param ngramCount int number of ngrams in the original test string
* @return average log likelihood: the total of loglikelihoods
*    divided by the ngramCount
*/
private double averageLogLikelihood(HashMap<String,Double> logs, int ngramCount) {

double logSum = this.totalLogLikelihood(logs);

return (logSum / ngramCount);
}

/** Helper method to calculate the total log likelihood statistic
* given a HashMap of strings and their Laplace probabilities
* and the total number of ngrams in the model.
*
* @param logs map of ngram strings and their log likelihood
*/
private double totalLogLikelihood(HashMap<String,Double> logs) {
double logSum = 0;

for (Map.Entry<String, Double> entry : logs.entrySet()) {
logSum += entry.getValue();
}

return (logSum);
}

/**
* @return the average log likelihood statistic
*/
public double getAverageLogLikelihood() {
return averageLogLikelihood;
}

/**
* @return the log likelihood value for a given ngram from the input string
*/
public double getLogLikelihood(String ngram) {
return (logLikelihoodMap.get(ngram));
}

/**
* Make a String summarising the log likelihood map and its statistics
* @return Header lines containing the testString, the averageLogLikelihood
*         and a sorted table of ngrams and their loglikelihoods
* The likelihood table should be ordered from highest to lowest likelihood
*/
public String toString() {
String returnString = “”;

returnString = returnString + Double.toString(this.averageLogLikelihood);
returnString = returnString + NgramAnalyser.printHashMap(this.logLikelihoodMap);

return returnString;
}

}

ProjectTest.java

import static org.junit.Assert.*;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.util.Set;

public class ProjectTest
{
/**
* Default constructor for test class ProjectTest
*/
public ProjectTest() {

}

/**
* Sets up the test fixture.
*
* Called before every test case method.
*/
@Before
public void setUp() {
}

/**
* Tears down the test fixture.
*
* Called after every test case method.
*/
@After
public void tearDown() {

}

//TODO add new test cases from here include brief documentation

/**
* Test if the number of lines in a string output from Ngram.toString()
* is valid (i.e equal to the size of the alphabet of that Ngram)
* Also ensures that the sort, splice and constructor functions work
* as required to produce the required comparison
*/
@Test(timeout=1000)
public void testSensibleToStringSize() {

String[] stringsToTest = {“Hello my friend”,
“be”,
“Have a nice day you filthy animal”,
“asdfghjkl\$\$sdfghj%%”,
“2”,
”    “};

Integer[] ngramSizesToTest = {1, 2, 3, 4, 5};

NgramAnalyser analysis;
String analysisString;

int i = ngramSizesToTest;
String s = stringsToTest;

if (i > s.length()) {
try {
analysis = new NgramAnalyser(i, s);
} catch (IllegalArgumentException e) {
assertEquals(0, 0);
}
} else {
analysis = new NgramAnalyser(i, s);
analysisString = analysis.toString();

//Number of lines is equal to the number of n’s plus 1
int numberofLines = analysisString.length() –
analysisString.replace(“n”, “”).length() + 1;

assert(numberofLines >= analysis.getAlphabetSize());

}
}

/**
* Tests various aspects of the getDistinctNgrams function
* inlcuding set length, …
*/

@Test(timeout=1000)
public void testGetDistinctNgrams() {
String[] stringsToTest = {
“123!@#123!@#”,
“aaaaaa”,
“HOWWEYVUFXBINEF”
};

String stringToTest = stringsToTest;
int ngramSize = 2;
NgramAnalyser analysis = new NgramAnalyser(ngramSize, stringToTest);

Set<String> distinctNgrams = analysis.getDistinctNgrams();
int distinctNgramCount = analysis.getDistinctNgramCount();
int totalNgramCount = analysis.getNgramCount();

//Test that there are fewer or equal distinct Ngrams than total Ngrams
assert(distinctNgramCount <= totalNgramCount);

//Test that the alphabet size is smaller than
//or equal to the number of distinct NGrams
assert(analysis.getAlphabetSize() <= distinctNgramCount);

}

@Test(timeout=1000)
public void testLaplaceExample() {
assertEquals(0,1); //TODO replace with test code
}

@Test(timeout=1000)
public void testSimpleExample() {
assertEquals(0,1); //TODO replace with test code
}

@Test
{
MarkovModel model = new MarkovModel(2,”aabcabaacaac”);
ModelMatcher match = new ModelMatcher(model,”aabbcaac”);
assertEquals(0,1); //TODO replace with test code
}

ModelMatcher.java

import java.util.HashMap;
import java.util.Collection;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.SortedSet;
import java.util.TreeSet;

public class ModelMatcher
{

/** log likelihoods for a teststring under a given model */
private HashMap<String,Double> logLikelihoodMap;
/** summary statistic for this setting */
private double averageLogLikelihood;

/**
* Constructor to initialise the fields for the log likelihood map for
* a test string and a given Markov model and
* the average log likelihood summary statistic
* @param MarkovModel model a given Markov model object
* @param String teststring the string to check compatability with the model
*/
public ModelMatcher(MarkovModel model, String testString)
{
logLikelihoodMap = new HashMap<String, Double>();
int kVal = model.getK();
String seq;
for (int curPos = 0; (curPos) < testString.length(); curPos++)
{
if (curPos + (kVal + 1) <= testString.length())
{
seq = testString.substring(curPos, (curPos + kVal + 1));
}
else
{
String fromEnd = testString.substring(curPos, (testString.length()));
String fromStart = testString.substring(0, ((kVal + 1) – fromEnd.length()));
seq = fromEnd + fromStart;
}

// seq should be of length k+1
// context should be the first k chars of seq
String context = seq.substring(0, kVal);
// impChar should be the last character of seq
String impChar = seq.substring(kVal, seq.length());

double loggedProb = Math.log10(model.laplaceEstimate(seq));
if (logLikelihoodMap.containsKey(seq))
{
logLikelihoodMap.put(seq, (logLikelihoodMap.get(seq) + loggedProb));
}
else
{
logLikelihoodMap.put(seq, loggedProb);
}
}
averageLogLikelihood = averageLogLikelihood(logLikelihoodMap, testString.length());

}

/** Helper method that calculates the average log likelihood statistic
* given a HashMap of strings and their Laplace probabilities
* and the total number of ngrams in the model.
*
* @param logs map of ngram strings and their log likelihood
* @param ngramCount int number of ngrams in the original test string
* @return average log likelihood: the total of loglikelihoods
*    divided by the ngramCount
*/
private double averageLogLikelihood(HashMap<String,Double> logs, int ngramCount)
{
double totalLogs = 0.0;
for (String i : logs.keySet())
{
totalLogs += logs.get(i);
}
double avg = totalLogs/ngramCount;
return avg;
}

/** Helper method to calculate the total log likelihood statistic
* given a HashMap of strings and their Laplace probabilities
* and the total number of ngrams in the model.
*
* @param logs map of ngram strings and their log likelihood
*/
private double totalLogLikelihood(HashMap<String,Double> logs)
{
double totalLogs = 0.0;
for (String i : logs.keySet())
{
totalLogs += logs.get(i);
}
}

/**
* @return the average log likelihood statistic
*/
public double getAverageLogLikelihood()
{
return averageLogLikelihood;
}

/**
* @param ngram String the ngram to find the log likelihood of.
* @return the log likelihood value for a given ngram from the input string
*/
public double getLogLikelihood(String ngram)
{
return (logLikelihoodMap.get(ngram));
}

/**
* Make a String summarising the log likelihood map and its statistics
* @return String of ngrams and their loglikeihood differences between the models
* The likelihood table should be ordered from highest to lowest likelihood
*/
public String toString()
{
SortedSet<String> keysArray = new TreeSet<String>(logLikelihoodMap.keySet());
String toRet = “”;
for (String key : keysArray)
{
double logLike = logLikelihoodMap.get(key);
String logLikeS = Double.toString(logLike);
String thisKey = (key + ”     ” + logLikeS + “n”);
toRet += thisKey;
}
}

}

projectTest.java

import static org.junit.Assert.*;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;

/**
* The test class ProjectTest for student test cases.
*/
public class ProjectTest
{
/**
* Default constructor for test class ProjectTest
*/
public ProjectTest()
{
}

/**
* Sets up the test fixture.
*
* Called before every test case method.
*/
@Before
public void setUp()
{
}

/**
* Tears down the test fixture.
*
* Called after every test case method.
*/
@After
public void tearDown()
{
}

//TODO add new test cases from here include brief documentation

@Test(timeout=1000)
public void testSensibleToStringSize() {
NgramAnalyser analyser = new NgramAnalyser(“aabcabfaacfaaac”);
int minLines = analyser.getAlphabetSize() + 1;
//This next bit is ridiculously inefficient memory wise but I’m lazy so there.
String[] lines = analyser.toString().split(“rn|r|n”);
int linesLength = lines.length;
//System.out.println(linesLength);
//System.out.println(minLines);
assertTrue(linesLength >= minLines);

}

@Test(timeout=1000)
public void testGetDistinctNgrams() {
NgramAnalyser analyser = new NgramAnalyser(“aaa”);
NgramAnalyser analyser1 = new NgramAnalyser(3, “abc”);
NgramAnalyser analyser2 = new NgramAnalyser(3, “bcbabbcbc”);
NgramAnalyser analyser3 = new NgramAnalyser(2, “baba”);

assertEquals(analyser.getDistinctNgrams().size(),1);
assertEquals(analyser1.getDistinctNgrams().size(),3);
assertEquals(analyser2.getDistinctNgrams().size(),6);
assertEquals(analyser3.getDistinctNgrams().size(),2);

}

@Test(timeout=1000)
public void testLaplaceExample() {
MarkovModel mkMdl = new MarkovModel(2, “aabcabaacaac”);
double c = mkMdl.laplaceEstimate(“aac”);
double b = mkMdl.laplaceEstimate(“aab”);
double a = mkMdl.laplaceEstimate(“aaa”);
//System.out.println(c);
assertTrue(c >= 0.4999 && c <= 0.5001);
assertTrue(b >= 0.3332 && b <= 0.3334);
assertTrue(a >= 0.1666 && a <= 0.1668);
}

@Test(timeout=1000)
public void testSimpleExample() {
MarkovModel mkMdl = new MarkovModel(2, “aabcabaacaac”);
double b = mkMdl.simpleEstimate(“aab”);
//System.out.println(b);
assertTrue(b == (1.0/3.0));
}

@Test
{
MarkovModel model = new MarkovModel(2,”aabcabaacaac”);
ModelMatcher match = new ModelMatcher(model,”aabbcaac”);
//Precision issues are amazing…
assertEquals(-0.3849, match.getAverageLogLikelihood(), 0.0001);
}
} Order NOW for a 10% Discount
Pages (550 words)
Approximate price: -

Why Us? Top Quality and Well-Researched Papers

All ourbpapers are written from scratch. In fact, Clients who ask for paraphrasing services are highly discouraged. We have writers ready to craft any paper from scratch and deliver quality ahead of time. Our writers keeps you posted on your papers progress - providing you with paper outline/draft. You are also at liberty to communicate directly with your writer. Free Unlimited Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support. Prompt Delivery and 100% Money-Back-Guarantee

All papers are always delivered on time, in many cases quite ahead of time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed. Original & Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services. Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

## Calculate the price of your order

Total price:
\$0.00

How it works? Fill in the order form and provide all details of your assignment. Proceed with the payment

Choose the payment system that suits you most. Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.  ## Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.  An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.  Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.  Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.