I’ve been very lax in my blog of late. A lot of really cool stuff has happened since the last post, I got into Georgia Tech’s MS CS program. Took some really awesome classes, and then got driven batshit crazy with the homework, assignments and projects. Don’t get me wrong, I love every minute of it, but it’s tiring nonetheless.
I thought I’d share one of my better assignments from my Knowledge-Based AI (KBAI) class with my readers. We’ve been trying to answer Raven’s Progressive Matrices using various KBAI techniques. This post will try to solve the test using the Learning by Recording Cases method.
Learning by Recording Cases
An algorithm that learns by cases normally makes use of an AI consistency technique. This technique compares previous cases to find the most similar of the cases. Learning by Recording Cases (henceforth referred to as LRC) is used most commonly in Natural Language Processing.
Consider a chatbot, Eliza. If Eliza was given the information in her knowledge base that the answer to the question, “What is your name?”, is “My name is Eliza”. Also suppose there was a question, “Your name is really pretty.”, and the answer was, “Why thank you!”. Then if the chatterbot was asked “And your name is?”, the AI Agent could calculate the similarity index of the two sentences, and this would tell us which one is closer in meaning to the given input. Using the online UMBC Similarity Service, we can actually get a measure of this similarity as our AI Agent would (see figure 5.1) and thus calculate which answer Eliza would give in return.
Here, the Similarity Score of 1.0 is greater than 0.75. Thus Eliza realises that the user’s input is most similar to the input “What is your name?”. Thus Eliza uses this pre-recorded case that she has learnt from to analyse and compare the inputs, correct for changes in phrasing, and then deliver her final output, “My name is Eliza”.
This algorithm, using UMBC for a similarity score, was performed by me in my Bachelor’s Degree Research Project to create a Chatterbot using Knowledge Based AI, with Ontology and Learning by Recording Cases. However, while this algorithm works in most cases, at times the output that is provided can be incorrect. More fine tuning was needed to get perfect answers. However, the comparison greatly improved the accuracy of the results that were correct with better contextual learning.
A heuristic such as the Similarity Score shown in the previous example is called a consistency heuristic. A consistency heuristic implies attributing certain properties of cases seen previously before to a new, unseen object. Once this has been attributed, we use a AI algorithm such as Nearest Neighbour to calculate the difference in the similarities of the cases around the new object. The nearest, or most similar, case is then adopted, and it’s solution is provided.
Nearest neighbour can be calculated in many ways, however the most efficient (especially for larger data sets) is to use that of the KD Decision tree using Guillotine cuts. Ideally a good attribute to split the decision tree on is one that splits subsets into all positive or all negative subsets (after the function is tested). But this is rare, so we use the “best” attribute by calculating the “goodness” factor of the Guillotine cut. Here the new object is tested (by each non-leaf node) against a certain parameter or test. Then if this object passes that test, the object is sent forward for more tests.
This is called a KD Tree (see the Fig. 5.2 above). The leaf node is reached by recursively examining nearby points, calculating the test for similarity and exploring the branches of the tree that are nearest (or most similar) to the query point, (see Fig. 5.3 above). Now that the leaf is reached we can calculate the distance between the query node to each of the nodes in the leaf. This will tell us which classification the new data point should fall under.
The NN search by the KD tree reduces the complexity (in both space and time) of the NN search. This method only fails in high dimensionality. Imagine a tree with 200 attributes, but only 1 is relevant to the necessary point. Then the NN is easily mislead and a lot of backtracking or recalculation of the neighbours are needed.
Algorithm for Learning By Recording Cases
- Create a decision tree for the training set
- Find attributes to split on using “goodness” of the Guillotine cut factors.
- A “good” attribute would be one where all the cases are split into positive or negative classes. For instance, to classify humans, sex would be a good attribute since all humans can be split into male and female accordingly.
- Compare the new query value to the current node attribute.
- At each non-leaf node classify the new query by comparing it with the test cases.
- Some cases may be eliminated completely. For those perform pruning of the tree.
- If the classification has been shortlisted to one guillotine cut (or a leaf node), then goto 4, else goto 2.
- Once the leaf node has been reached calculate the consistency heuristic for each point in the leaf node. Choose the point that is most similar to be the correct case.
- Use the output of that case as the output of the solution.
Applying Learning by Recording Cases to the Raven’s Test
- Create the decision tree for the data.
- Here attributes would be the features of the objects in the Raven’s test.
- Compare the goodness factor of the features to choose which to split on. For instance let us compare “Shape” with “Angle”
- Here, we see that shape is a definite attribute. The objects are definitely a triangle or not. Similarly, they are either a square or circle or not. They cannot be both a square and a circle at the same time. However, for Angle, the answer changes. For the objects angle to change, the object could be either Rotated or Reflected, or both. Thus rotation of 180 could actually be caused by either rotation or a horizontal reflection. This is more difficult to classify.
- Thus based on goodness of attributes we initially split our decision tree on “Shape”.
- Similarly choose other attributes to split the guillotine cut in the KD tree at each level. (This split can be calculated using tools such as Weka using similar consistency heuristics instead of manually)
- Now compare the question as shown below in Fig 5.5.
- The first 2 Shapes are compared. They are the same. ie, A = circle, B=circle. Here the change from A to B is our Training data. The tree learns by recording a case that if A’s shape is a shape X (in this case a circle) then B must also be classified the same way.
- Since C is of shape Y (ie in this case a square) then D must also be classified the same way.
- The algorithm now shortlists the Answer options down using the deduction made in Step 4. Thus now the answer option is 3, and 5.
- The next node on the decision tree is reached, this is the “Fill” attribute split. The output of the fill attribute is Yes or No.
- During the training, the tree records a case that if the Shape A is classified as Fill:No, then the Shape B must have values of Fill:Yes. Thus the tree understands that in the binary value Yes:No, there is a toggle.
- Now from Answer Options 3 & 5, the algorithm compares the consistency heuristic. Since C is Fill:No, therefore D must have values Fill:Yes.
- Thus Option 5 is shortlisted. The consistency heuristic metric reveals that Option 5 is the most similar to all the answers. Therefore Option 5 is selected because it is the closest in similarity to a pre-existing case that has been recorded.
- UMBC Similarity Scores for Phrases :: http://swoogle.umbc.edu/SimService/GetSimilarity
- Guillotine Split : Concept taught in CS 6601 by Prof. Thad Starner and applied to the Learning by Recording Cases algorithm due to the similarity in the types of problems faced in NN for higher dimensions.
- P.H. Winston. Artificial Intelligence. (2nd edition) Artificial Intelligence Laboratory, MIT, Addison-Wesley, Reading, Mass (1984)