Motivation

As described in my original article News Goggles is the name of an imaginary web site focused on daily news feeds. A proof-of-concept rules engine applies matchmaker rules. These rules dispatch news feeds to readers based on their personal preferences.

The original implementation used Prolog and Java integration. My prior article Turning Functioning Logic into Logical Functions describes how I converted a simple poker game rules engine from Prolog and Java to Scala and Java.

I have similarly converted the News Goggles rules engine into a Scala and Java implementation.

The Setup

This demonstration rules engine uses Scala, which is a language that runs on the Java Virtual Machine (JVM).

A complete Scala integrated development environment (IDE), including Eclipse, is available here: http://typesafe.com/stack/downloads/scala-ide

The News Goggles demo implementation is in the public domain and is available for download from this GitHub repository: https://github.com/rrutt/NewsGogglesScala.git

To run the demo, clone the GitHub repository to your local workstation. Open a command-prompt (terminal) window and navigate to the root folder of your cloned repository. Enter one of the following commands. (Reverse the slashes if using Linux or Mac.) The program will process rules, display progress messages and results, and then exit.

  java -jar out\NewsGogglesScala.jar
  
  java -jar out\NewsGogglesScala.jar -test

The Application

Here are the business requirements for the News Goggles rules engine:

  1. Providers publish Articles.
  2. Subscribers open the web site to review Articles.
  3. The Providers tag each Article with one or more Topics.
  4. Each Subscriber registers a set of Subscription Rules:
    • A Subscriber can Like any number of Providers and/or Topics.
    • A Subscriber can also Dislike any number of Providers and/or Topics.
    • Dislike rules override Like rules.
  5. A Subscriber can also enter Exception Rules that override their Subscription Rules:
    • A Subscriber can Allow a specific combination of Provider covering a Topic.
    • A Subscriber can Block a specific combination of Provider covering a Topic.
    • Block rules override Allow rules.
  6. Identifier conventions: Except for Articles, identifiers are quoted literals.
    • Articles are identified by arbitrary unique integers. The content of each Article is a quote-delimited string.
    • Providers are identified with a $ prefix. Examples:
      “$CNN” “$FOX” “$ESPN” “$CBC” “$NBC” “$Wired” “$NYTimes”
    • Topics are identified with a # prefix. Examples:
      “#world” “#usa” “#sports” “#entertainment” “#business” “#politics” “#opinion”
    • Subscribers are identified with an @ prefix. Examples:
      “@Alice” “@Bob” “@Chris” “@Pat”

Let’s see how our data model for storing the news feed data and the preference rules looks in the original Prolog notation:

/*
Data-base of published articles, designated as 
  article(ArticleId, Provider, Contents).
  article_topic(ArticleId, Topic).
*/

  article(1001, "$ESPN", "Tigers sign Prince Fielder.").
  article_topic(1001, "#sports").
  article_topic(1001, "#baseball").
  article_topic(1001, "#detroit").
  
  article(1002, "$ESPN", "Peyton Manning home team locker to be used by arch-nemesis Tom Brady?").
  article_topic(1002, "#sports").
  article_topic(1002, "#indianapolis").
  article_topic(1002, "#football").
  
  article(2001, "$FOX", "Newt nabs South Carolina.").
  article_topic(2001, "#politics").
  article_topic(2001, "#republicans").
  
  article(2002, "$FOX", "Do we need to go back to the moon?").
  article_topic(2002, "#usa").
  article_topic(2002, "#opinion").
  
  article(3001, "$CNN", "Obama visits Michigan.").
  article_topic(3001, "#politics").
  article_topic(3001, "#democrats").
  article_topic(3001, "#detroit").
  
  article(3002, "$CNN", "Obama urges congress to act in election year.").
  article_topic(3002, "#politics").
  article_topic(3002, "#democrats").
  article_topic(3002, "#washington").
  
  article(4001, "$MSNBC", "Mitt makes moves in Florida.").
  article_topic(4001, "#politics").
  article_topic(4001, "#republicans").
  
  article(4002, "$MSNBC", "Celebrities attending Super Bowl parties.").
  article_topic(4002, "#indianapolis").
  article_topic(4002, "#entertainment").

/*
Rule-base of Subscriber preferences, designated as:
  subscriber_likes(Subscriber, ProviderOrTopic).
  subscriber_dislikes(Subscriber, ProviderOrTopic).
  subscriber_allows(Subscriber, Provider, Topic).
  subscriber_blocks(Subscriber, Provider, Topic).
*/

  subscriber_likes("@Alice", "#politics").
  subscriber_likes("@Alice", "#sports").
  subscriber_likes("@Alice", "$CNN").
  subscriber_likes("@Alice", "$MSNBC").
  
  subscriber_likes("@Bob", "#politics").
  subscriber_likes("@Bob", "#detroit").
  subscriber_dislikes("@Bob", "#democrats").
  subscriber_dislikes("@Bob", "$CNN").
  subscriber_allows("@Bob", "$CNN", "#detroit").
  
  subscriber_likes("@Chris", "#politics").
  subscriber_dislikes("@Chris", "$FOX").
  
  subscriber_likes("@Pat", "$FOX").
  subscriber_likes("@Pat", "#sports").
  subscriber_likes("@Pat", "#republicans").
  subscriber_blocks("@Pat", "$FOX", "#opinion").  

Data Parser

In the original NewsGoggles implementation, the Prolog engine processed the articles and preferences data feeds into its in-memory database. For this Scala implementation, a custom parser was defined to load these data feeds into a set of Scala Map and Set structures.

Fortunately Scala provides excellent support for writing parsers for domain-specific languages (DSLs). Part of that support occurs due to the fact that method names can consist of sequences of special characters. Using the optional “dot free” infix notation, several libraries exist that provide custom internal DSL syntax that can be directly understood by the normal Scala compiler.

One such internal DSL in the scala.util.parsing.combinator package namespace allows grammer definitions to be defined for a custom external DSL.

Here is the beginning of my custom RulesDataParser:

package com.live.rrutt.newsgoggles.scala

import scala.util.parsing.combinator._

class RulesDataParser extends JavaTokenParsers {

  object RuleType extends Enumeration {
    type RuleType = Value
    val article, article_topic, 
      subscriber_likes_provider, subscriber_dislikes_provider, 
      subscriber_likes_topic, subscriber_dislikes_topic, 
      subscriber_allows, subscriber_blocks = Value
  }
  import RuleType._

  def rulesList = rep(rule)
  
  def rule = article | article_topic | 
    subscriber_likes | subscriber_dislikes |
    subscriber_allows | subscriber_blocks

I defined an enumeration of my rule types, and defined my first two grammar rules:

  • rulesList consists of a repetitition of rule constructs.
  • rule consists of a choice of several other constructs.

Here is the grammar breakdown for the article construct:

  def article = "article" ~> articleBody ^^ {
    case (a, p, c) => (RuleType.article, (a, p, c))
  }
  def articleBody = "(" ~> articleId ~ articleProviderContents ^^ {
    case a ~ pc => {
      val (p, c) = pc
      (a, p, c)
    }
  }
  def articleProviderContents = "," ~> provider ~ articleContents ^^ {
    case p ~ c => (p, c)
  }
  def articleContents = "," ~> articleText <~ ")" <~ "."

Here we see some of the internal DSL notation defined by the scala.util.parsing libraries that correspond to grammar parsing support methods:

  • A quoted string literal defines a terminal text token.
  • ~> separates parse elements and indicates any element(s) to the left can be discarded from the resulting abstract syntax tree.
  • ~ separates parse elements and indicates both elements must be returned in the abstract syntax tree.
  • <~ separates parse elements and indicates any element(s) to the right can be discarded from the resulting abstract syntax tree.
  • ^^ ends the parse element definition and begins the production logic to customize the resulting abstract syntax tree.

Here we see some conditional production logic that uses our custom data language’s rule that has provider literals start with “$” and topic literals start with “#”. A subscriber_likes data statements parses into either a subscriber_likes_provider or a subscriber_likes_topic rule type in the resulting abstract syntax tree.

  def subscriber_likes = "subscriber_likes" ~> subscriber_likes_dislikesBody ^^ {
    case (s, tp) => {
      if (tp.startsWith("$")) {
        (RuleType.subscriber_likes_provider, (s, tp))
      } else {
        (RuleType.subscriber_likes_topic, (s, tp))
      }
    }
  } 
  def subscriber_dislikes = "subscriber_dislikes" ~> subscriber_likes_dislikesBody ^^ {
    case (s, tp) => {
      if (tp.startsWith("$")) {
        (RuleType.subscriber_dislikes_provider, (s, tp))
      } else {
        (RuleType.subscriber_dislikes_topic, (s, tp))
      }
    }
  } 
  def subscriber_likes_dislikesBody = "(" ~> subscriber ~ topicOrProvider <~ ")" <~ "." ^^ {
    case s ~ tp => (s, tp)
  }
  def topicOrProvider = "," ~> textString

Our textString rule strips any quotes off parsed text elements so only the actual contents are returned in our syntax tree.

  def textString = stringLiteral ^^ {
    // Strip off any surrounding quotes.
    case s if (s.startsWith("\"") && s.endsWith("\"")) => {
      s.substring(0, (s.length() - 1)).substring(1)
    }
    case s => s
  }

Note that my original Prolog data file used apostrophes for some elements. The base JavaTokenParsers class my parser inherits has issues with multi-character apostrophe literals. I avoid this issue by replacing all apostrophes in my data files with full quotes.

I avoided having to define grammar rules for data file comments by stripping them out in the DataLoader.java class that reads the files:

      InputStreamReader isr = new InputStreamReader(theoryInputStream);
      BufferedReader br = new BufferedReader(isr);

      boolean skippingComments = false;
      String s = br.readLine();
      while (s != null) {
        if (skippingComments) {
          if (s.trim().endsWith("*/")) {
            skippingComments = false;
          }
        } else if (s.trim().startsWith("/*")) {
          skippingComments = true;
        } else {
          sb.append(s);
          sb.append('\n');
        }

        s = br.readLine();
      }

Here is the Scala method to use our custom parser to load our data files into an abstract syntax tree within a local 2-tuple called parsedRulesData:

  /*
   * Logic to load the dynamic data,
   * which consists of the current set of Articles
   * and the Subscriber preferences.
   */
  def loadData(rulesDataText: String): Boolean = {
    val p = new RulesDataParser
    val parseResult = p.parseAll(p.rulesList, rulesDataText) 
    val parsedRulesData = parseResult match {
      case p.Success(ast, _) => {
        println("----- Parse Result:")
        println(ast.toString())
        (true, ast)
      }
      case p.Failure(msg, next) => {
        println("----- Failure Message:")
        println(msg.toString())
        println("----- Parsed:")
        println(rulesDataText.substring(0, next.offset))
        println("----- Unparsed:")
        println(next.source)
        (false, null)
      }
      case p.Error(msg, next) => {
        println("----- Error Message:")
        println(msg.toString())
        println("----- Parsed:")
        println(rulesDataText.substring(0, next.offset))
        println("----- Unparsed:")
        println(next.source)
        (false, null)
      }
    }

Here is the beginning of the logic to convert the abstract syntax tree into our internal Map and Set structures:

  var mapArticleProviderContents = mutable.Map.empty[String, (String, String)]
  var mapArticleTopics = mutable.Map.empty[String, List[String]]
  
  var setProviders = mutable.Set.empty[String]
  
    . . .

    parsedRulesData match {
      case (true, ast) => {
        for (rule <- ast) {
          rule match {
            case (p.RuleType.article, (articleId, provider, contents)) => {
              val a = articleId.toString()
              val p = provider.toString()
              val c = contents.toString()
              mapArticleProviderContents(a) = (p, c)
              setProviders += p
            }
            case (p.RuleType.article_topic, (articleId, topic)) => {
              val a = articleId.toString()
              val t = topic.toString()
              if (mapArticleTopics.contains(a)) {
                mapArticleTopics(a) ::= t
              } else {
                mapArticleTopics(a) = List(t)
              }
            }

Mid-Level Rules Engine

Using the techniques described in my prior article Turning Functioning Logic into Logical Functions I converted various Prolog rules into Scala.

Here are the rules to determine whether a subscriber likes or dislikes the provider of an article. Note that each article has only one provider, so basically we check whether the subscriber’s list of providers contains the article’s provider:

  def subscriber_likes_article_provider(s: String, a: String): Boolean = {
    subscriberProviderMapMatchesArticle(s, mapSubscriberLikesProviders, a)
  }  
  def subscriber_dislikes_article_provider(s: String, a: String): Boolean = {
    subscriberProviderMapMatchesArticle(s, mapSubscriberDislikesProviders, a)
  }
  def subscriberProviderMapMatchesArticle(s: String, providerMap: Map[String, List[String]], a: String): Boolean = {
    mapArticleProviderContents get a match {
      case None => false
      case Some((p, _)) => {
        providerMap get s match {
          case Some(pList) => pList.contains(p)
          case _ => false
        }
      }
    } 
  }

The rules for subscriber vs. article topics is slightly different, since each article can have several topics. We look for an intersection between the subscriber’s list of topics and the article’s list of topics:

  def subscriber_likes_article_topic(s: String, a: String): Boolean = {
    subscriberTopicsMatchArticle(mapSubscriberLikesTopics get s, a)
  }  
  def subscriber_dislikes_article_topic(s: String, a: String): Boolean = {
    subscriberTopicsMatchArticle(mapSubscriberDislikesTopics get s, a)
  }
  def subscriberTopicsMatchArticle(sTopics: Option[List[String]], a: String): Boolean = {
    val aTopics = mapArticleTopics get a
    (aTopics, sTopics) match {
      case (Some(atList), Some(stList)) => !((atList intersect stList) isEmpty)
      case _ => false
    }
  }

The rules for a subscriber allowing or blocking a combination of provider and topic add a little more complexity. We obtain the single provider for the article, as well as its list of topics. We then map the provider into a 2-tuple paired with each of the topics. Finally we check for an intersection between that list of 2-tuples and the subscriber’s list of (provider, topic) pairs:

  def subscriber_allows_article_provider_and_topic(s: String, a: String): Boolean = {
    subscriberProviderTopicsMatchArticle(mapSubscriberAllowsProviderTopics get s, a)
  }  
  def subscriber_blocks_article_provider_and_topic(s: String, a: String): Boolean = {
    subscriberProviderTopicsMatchArticle(mapSubscriberBlocksProviderTopics get s, a)
  }  
  def subscriberProviderTopicsMatchArticle(sProviderTopics: Option[List[(String, String)]], a: String): Boolean = {
    val aProviderContents = mapArticleProviderContents get a
    val aTopics = mapArticleTopics get a
    val aProviderTopics = (aProviderContents, aTopics) match {
      case (Some((ap, _)), Some(atList)) => atList.map{at => (ap, at)}
      case _ => List.empty[(String, String)]
    }
    (aProviderTopics, sProviderTopics) match {
      case (aptList, Some(sptList)) => !((aptList intersect sptList) isEmpty)
      case _ => false
    }
  }

Top-Level Rules Engine

Once we have the mid-level rules defined, the top-level rules appear relatively simple. We are able to use pattern matching with guard conditions based on the mid-level rules:

  def article_is_visible_to_subscriber(a: String, s: String): Boolean = {
    (a, s) match {
      case (a, s) if subscriber_blocks_article_provider_and_topic(s, a) => false
      case (a, s) if subscriber_allows_article_provider_and_topic(s, a) => true
      case (a, s) if subscriber_dislikes_article_provider(s, a) => false
      case (a, s) if subscriber_dislikes_article_topic(s, a) => false
      case (a, s) if (
        subscriber_likes_article_provider(s, a) ||
        subscriber_likes_article_topic(s, a)
        ) => true
      case _ => false
    }
  }

Result Collection

Here is our main Java method that drives the logic:

  private void run() {
    try {
      DataLoader loader = new DataLoader();
      String dataText = loader.load();
      
      boolean dataOk = RulesEngine.loadData(dataText);
      if (!dataOk) {
        throw new Exception("Invalid data text.");
      }
      
      RulesEngine.show_all_news();

      if (testing) {
        boolean testsAllPass = RulesEngineTest.runAllTests();
        if (testsAllPass) {
          System.out.println("Test run succeeded.");
        } else {
          System.out.println("Test run did NOT succeed.");
        }
      }
    } catch (Exception e) {
      e.printStackTrace();
    }
  }

Here is the Scala logic for the RulesEngine.show_all_news() method call:

case class Article(id: String, Provider: String, Contents: String)
case class Subscriber(handle: String)
case class Provider(symbol: String)
case class SubscriberFeed(subscriber: Subscriber, articles: Seq[Article])
case class ArticleReaders(article: Article, readers: Seq[Subscriber])
case class ProviderReaders(provider: Provider, readers: Seq[Subscriber])

  . . .
  
  // Print out the current Article feed for all Subscribers.
  def show_all_news = {
    println
    println("== All Subscriber Feeds ==")
    for (s <- setSubscribers.toList.sortBy{s => s}) show_news(s)
    println("====")
  }

  // Print out the current Article feed for a Subscriber. 
  def show_news(s: String) = {
    println
  println("Articles subscribed by %s:".format(s))
  println(subscriber_feed(s).toList)
  println("-- End --")
  }
  
  // Given a Subscriber,
  // return the list of current Article details that match their preferences.
  def subscriber_feed(s: String) = {
    val filteredArticles = mapArticleProviderContents.iterator.filter{case (a, _) => article_is_visible_to_subscriber(a, s)}
    filteredArticles.toList.sortBy{case (a, _) => a}.map{case (a, (p, c)) => Article(a, p, c)}
  }

Here is the resulting console output:

== All Subscriber Feeds ==

Articles subscribed by @Alice:
List(Article(1001,$ESPN,Tigers sign Prince Fielder.), Article(1002,$ESPN,Peyton Manning home team locker to be used by arch-nemesis Tom Brady?), Artic
le(2001,$FOX,Newt nabs South Carolina.), Article(3001,$CNN,Obama visits Michigan.), Article(3002,$CNN,Obama urges congress to act in election year.),
Article(4001,$MSNBC,Mitt makes moves in Florida.), Article(4002,$MSNBC,Celebrities attending Super Bowl parties.))
-- End --

Articles subscribed by @Bob:
List(Article(1001,$ESPN,Tigers sign Prince Fielder.), Article(2001,$FOX,Newt nabs South Carolina.), Article(3001,$CNN,Obama visits Michigan.), Article
(4001,$MSNBC,Mitt makes moves in Florida.))
-- End --

Articles subscribed by @Chris:
List(Article(3001,$CNN,Obama visits Michigan.), Article(3002,$CNN,Obama urges congress to act in election year.), Article(4001,$MSNBC,Mitt makes moves
 in Florida.))
-- End --

Articles subscribed by @Pat:
List(Article(1001,$ESPN,Tigers sign Prince Fielder.), Article(1002,$ESPN,Peyton Manning home team locker to be used by arch-nemesis Tom Brady?), Artic
le(2001,$FOX,Newt nabs South Carolina.), Article(4001,$MSNBC,Mitt makes moves in Florida.))
-- End --
====

Summary

The NewsGoggles engine takes dynamic feeds of news articles and subscriber preferences and applies a set of matchmaker rules against those input feeds. This yields output news feeds tailored for each subscriber.

The input feeds use a custom syntax that is parsed by the NewsGoggles engine. The current implementation was adapted from Prolog, so the data feeds use Prolog syntax. However another syntax such as JSON or XML could be just as easily processed.

A data syntax that appears closer to actual human language could also be defined. Scala easily supports defining a custom domain-specific language parser for that data syntax.

In addition to the easier access to breakpoint debugging, I see value in Scala over Prolog in terms of likely acceptance by corporate architects for use in a Java application shop. The Scala compiler produces regular Java byte-code, compatible with JVM 1.5 and higher. At run-time, a Scala library JAR file must be deployed with the application but this does not differ from any other open-source support tool used to enhance Java developer productivity.

The flexibility demonstrated here can easily be integrated into any business application requiring a data-matching rules engine.

Advertisements