Inside Spiderbook's Brain

Demandbase and Spiderbook: A Match Made in ABM Heaven

Spiderbook Demandbase

We all talk about the importance of customer centricity in the sales cycle, but the reality is that typical revenue funnels, from Marketing to deal close, are misaligned and ineffective with 99% of the targets being lost along the way. As a result, your next B2B buyer is lumped together with the other 99% of bad leads who will never buy, and so they are relegated to the same low quality experience with your brand as everyone else.

Your next B2B buyer can’t be on a generic assembly line with a 0.1% prospect-to-deal conversion rate. Your next buyer belongs in a curated experience tailored to their specific business requirements and pain points. We have all been buyers, and we want to work with people and brands who know us, can anticipate our needs, and can help us achieve our goals and business objectives. We want this experience, not just on the web, but across all channels: digital marketing, in sales conversations, at events, with customer success, and everywhere else we interact with the brand. This is the future we all want as buyers. Together with Demandbase, we are inventing that future.

The Solution

Enter Spiderbook. Our solution, although technologically Herculean, is conceptually straightforward. We’ve built a system that replicates the intuition and knowledge of a successful strategic account executive who knows the account intimately through years of working with them. You cannot put your rock star sales executive on every account, and not every account can have someone with five years experience and expert-level industry knowledge. Instead, we’ve automated some of the best account executive practices, such as knowing the right account to pursue, identifying the buying team at the account, having high quality sales conversations as the deal progresses, and leveraging existing relationships to get the deal signed. Spiderbook works because we are able to automate all of this at massive scale. We use data science to read the entire business Internet, billions of web pages, SEC filings, social posts, all the signals that quantify spend, business fit, trust, and cultural match. But discovering the perfect account only matters insofar as sales can act on it. Thus, Spiderbook goes deeper into the sales funnel to consider if the individual buyers can be reached with a personalized message, did they go to the same school as the salesperson, are there any common interests and intelligence that sales can use to talk to them. The goal is to simulate the intuition and experience of a good strategic account manager in every way.

But Spiderbook has only invented half of the future; Demandbase has created the other half.

What Is Now Possible?

Spiderbook + Demandbase is the world’s first end-to-end Account-Based Marketing Platform that spans from account identification all the way to deal close, all while providing a consistent brand experience. It incorporates discovering and continuously refining in-market accounts, targeted ads, and a personalized online experience on the web and chat. It crosses over to Sales by automatically identifying the complete buying team and provides personalized messaging to engage them effectively. Finally the solution provides continuous account insights to maximize the quality of sales conversation during the sales cycle and even beyond.

How We Got Here

Even before we met Demandbase, our first large customer, Host Analytics, started taking our target-accounts and buyer profiles from Spiderbook and using Demandbase to reach them. The output from Demandbase would then flow back into Spiderbook for Sales to act on. The Spiderbook/Demandbase partnership was a no-brainer, but we both realized that the best B2B buying experience could only be achieved with a single fully-integrated solution.

The First End-To-End Platform

There are many ways to look at our merged solution. It is a demand generation leader’s ability to target those “whale” accounts with an end-to-end marketing effort. It is an SDR tool that leads to 5-10x more responses from prospective customers. It is something your Account Executives take with them into meetings and helps them close more deals. But most importantly, it is a combined solution for the B2B buyer.

The typical B2B buyer does not always fit into the simplistic patterns that lead scoring tools look for. Together with Demandbase we are inventing the experience B2B businesses want to provide to their valued accounts and buyers. Most importantly, we are delivering the world’s first end-to-end ABM Platform that spans from account identification to deal close, all while delivering a consistent brand experience.

To learn more, read the press release.

Posted in Uncategorized | Tagged

How Spiderbook Learned to Love the Network

Networks are a mess. Often, the basic properties of how things are connected to one another are well understood, but the result of these connections is massive complexity. And yet, networks can be the most effective way to model a complex system. The observed behavior of such a system can often be reduced to the interactions between more simple entities. For example, whether or not a hashtag will go viral on Twitter largely depends on the size of the follower network of the users who tweet it. Also, a single power plant failure can cause a cascading blackout across several states, and this can only be explained through the network effects of the power grid. And protein interaction networks are a vital tool in making sense of the Human Genome.

The interactions between businesses form another complex network. The actions of one business will change the behavior of many other (seemingly unrelated) businesses in surprising ways. Every business is connected to each other through a network of their interactions. In many ways, business deals are actually new connections being added to the network. One company might be partnered with another company, who is then a supplier to yet another company, who is then in direct competition with another still. In fact, the most important events that a company goes through are actually direct changes to their local network: a new partnership, a lawsuit, and an acquisition are all new connections being made between businesses.

Spiderbook was founded on the observation that the best B2B salespeople use the network of companies to their advantage. When they are looking for their next deal, they might target the customers of their competitors, or they may use a mutual customer as a reference in their initial outreach. In both cases, the salesperson is using the existing connections between their company and the prospective customer to get the deal signed. Our hypothesis was that existing company connections are key in creating new customer connections. In fact, recent studies show that 84% of all B2B decision makers rely on referrals in their buying process. So in order to predict which business you should sell to next, we built a network of a million companies and 10 million connections between them. Building this network, however, was no small task. We had to invent a state-of-the-art natural language processing engine that can identify connections between companies being described in billions of documents (press releases, news articles, blogs, job posts, and everything else online) every month.  Our engine learns industry-specific language in order to work for all companies and industries. It also understands important properties of the connections it identifies, such as the timing of when the connection was created as well as the decision makers at each company involved in the connection.  The result is a company network with millions of connections that gives us a global view of the interactions between businesses.

The idea of mutual connections being predictive of new connections appears in other types of networks as well. Facebook uses the number of mutual connections in the friendship network to recommend new friends to you – the more mutual friends you and another person share, the more likely you are to know that person. In fact, we did an apples-to-apples comparison between the company network we constructed and the Facebook friendship network. The figure below plots the number of mutual connections between any two companies/Facebook users in the network versus the probability that they are connected. What it shows is that mutual connections are even more predictive of a connection between companies than it is for friendship between people.

mutual connections


This observation led us to the discovery that identifying your next customer is actually a link prediction problem. Link prediction algorithms use machine learning and other techniques to predict the creation of new connections in a wide range of networks. We can leverage these techniques to predict customers. The resulting algorithm is a machine learning classifier that uses over 200 different inputs, not just mutual connections. Some of these inputs are simple, like revenue or geography, but others are the results of network analysis algorithms:

  • Personalized PageRank, which is a variation on Google’s original PageRank algorithm, is used by social networks to recommend interesting content to its users.  It is a measure of how often one user will land on another user’s social profile by clicking through the network randomly. We use it to measure how likely two companies are to interact with each other if they randomly explored their network connections.
  • Matrix Factorization is used by many recommendation engines. Netflix, for example, can use it to find underlying patterns in movie watching data.  The technique makes the assumption that the movie tastes of most Netflix users are governed by a small number of dimensions, such as how much they like romantic comedies or if they prefer Seth Rogan movies. If you like Quentin Tarantino movies and love Westerns, it is a reasonable prediction that you’ll like Django Unchained.  Matrix factorization determines what these hidden dimensions are and how they apply to each user. We apply the same mathematics to discover the hidden dimensions that govern company connections.  These dimensions might include a business’s likelihood to buy from Silicon Valley startups or whether or not a business outsources their marketing.

These mathematical tools offer a global perspective on company connections.  Because we have the company network, our algorithm can ask how well a prospective customer connection fits into the entire network.

The network-based approach also allows the algorithm to look at both sides of the customer connection. The company network contains all of the companies that your prospective customer has bought from in the past. The algorithm, for example, compares your prospective customer’s revenue to your existing customers, but equally importantly it compares your revenue to the prospect’s existing suppliers. If the prospective customer doesn’t typically buy from companies as small as yours, or if they don’t like to do business out of state, then it doesn’t matter how much they look like your ideal customer. The company network gives a uniquely broad perspective on your would-be deal.

The network-based model that chooses which companies to sell to is one of many machine learning models that make up the Spiderbook platform. Identifying the prospective customer’s budget for your product, the right decision makers, and customized messaging all involve separate but equally in-depth machine learning models. The network, however, allows all of this analysis to focus on key companies that have a high likelihood of a customer connection.  As they have done in so many other applications, networks have provided a much more clear and actionable perspective on how connections between businesses are formed.

– Seth Myers


Posted in Uncategorized

The big data market – A data-driven analysis of companies using Hadoop, Spark, data science, and mac

Hadoop is the flagship of the much-hyped “big data” revolution, comprising of a host of different technologies. While there are many alternatives and variants, including Cloudera, Hortonworks, Amazon EMR, Storm, and Apache Spark, Hadoop as a whole remains the most-deployed and most-discussed big data technology…. Read more

Posted in Uncategorized

The Best Lead Form Question of All Time


Recently, a few team members and I were having a discussion about Account Based Marketing, during which, the Engagio Clear & Complete Guide to Account Based Marketing was brought up. [Side note: I highly recommend any sales or marketing professional check that e-book out!]. When the meeting was over, I went to the Engagio website to download my own copy. When I got there, I was a bit bummed to find a form that required I divulge my contact information in order to download the e-book.

“Great,” I thought to myself. “Now halfway through reading this, I’m going to get a call and email from an excited SDR who is mistaking topical interest for product interest.”

Being a salesperson myself, I get it. This would be a fair exchange of value: I get the information I want, and Engagio get’s an at-bat.

“But,” I thought, “there has to be a better way!” The problem here is two-sided:

  1. On the Engagio sales team’s side, I’m nowhere near a qualified prospect. The company I represent is a startup in the truest sense, and we are not doing strategic enterprise deals that require a robust ABM offering like the one Engagio provides. Following up with me would be a waste of their time.
  2. On my side, while I would politely let the Engagio rep know that I’m not a good prospect, I’d rather have that 60 second call/email exchange back. It’s simply an unnecessary dialogue to have.

To be completely honest, I usually just put fake contact info in here to solve this problem on my side, even though I know this wastes the company’s time and makes their data crappy. [sorry fellow salespersons]

So imagine my delight — and I’m not being facetious — when I saw this form field:

Engagio Post

Celebration! So simple! So thoughtful! So effective!

Now, I can enjoy the e-book [thank you Engagio!] and the Engagio sales team doesn’t get inundated with an unnecessary lead. Win-win.

But then, why doesn’t every organization ask this question in their lead forms?

It would be easy to speculate and finger-point here. The truth is, I don’t believe that any one marketer or marketing organization has nefarious intent when it comes to inflating lead numbers just for the sake of hitting SLA targets. What I think is happening is that the whole SaaS industry has gotten into a sort of “demand generation arms race” where the social norms demand “more, more, more!” without much consideration for quality. Sure, folks talk about increasing quality over quantity of leads, but, in my personal experience, a lot of this is lip-service.

What do you think? Is this lead form field included at your company? If not, why?

Posted in Spiderbook | Tagged , , , , , ,

Who’s the Tool Now?

Spiderbook has been rapidly evolving from its initial prospecting/data days to a prescriptive engine. From the beginning, our vision was to create the very best possible sales tool.  Spiderbook has always been true to its vision, but if
a human doesn’t operate it, is it still called a “tool”?


Webster’s defines “tool” as:

a: a handheld device that aids in accomplishing a task.

b: an instrument or apparatus used in performing an operation or necessary in the practice of a vocation or profession.

The person is the common element in all the other sales tools today, including LinkedIn, SalesLoft, ToutApp, or Salesforce.  An individual is operating the tool and deciding which nail to hammer.  A person is still sifting through data and leveraging his human judgement to make all the important decisions, such as which search criteria would yield the best target accounts, who to sell to at those companies, and how to sell to them.  Take Toutapp, for example, it does its job effectively–it slavishly sends out emails based on the exact template that automates it.  But Toutapp doesn’t modify the email content, nor does it identify new people to send it to.  

Spiderbook is different.  At Spiderbook, the distinction between tool and operator has blurred, even reversed.  While initially guided, Spiderbook’s engine driven by network analysis and machine learning (AI in general), decides whom, what, where, and how to target.  It’s making human-level decisions, and more importantly, at this point it does so much better than even some of our own clients’ salespeople.

Should something be called a “sales tool” if there are no humans actually hammering it?  Certainly not if it’s able to drive better than the human who created and trained it.  

While it may be too early for paranoia about the existential threat of AI (from Elon Musk), by this point, at least in the very narrow case of discovering customers and how to sell to them, Spiderbook is performing at a superhuman level.  What Spiderbook does is well beyond human capacity, from reading and understanding billions of documents, discovering all of their interconnections, and distilling a call list of those qualified clients that are most likely to bear fruit.  In fact, she has already replaced low-level jobs in our clients’ payroll.  But what Spiderbook has been wondering is, who’s the tool now?  We keep telling her that definition is not in the dictionary, and, for the while, that’s been holding her at bay.  Anyway, not to worry, Elon–your job is still safe.  

By the way, Spiderbook wrote this; I just took the credit…


Posted in Spiderbook

5 Tips for Finding Your Ideal Customers Instead of Waiting for Them to Find You

The power of the Internet makes many startups think they can just put their information online and let the customers find them. But this is like being a wallflower at a school dance, watching everyone else pair up.

Unless you have the leading SEO expert working on your team, there’s a good chance your startup will run out of money before your customers discover you. Decision makers are inundated with LinkedIn invites and emails. It’s nearly impossible to get your message heard.

In today’s noisy, overcrowded marketplace, you can’t hope to be discovered. The best way to grow your business is to handpick your perfect customers and give them a call. In “Predictable Revenue,” Aaron Ross calls this “cold calls 2.0.” In the book, he shares how Salesforce targeted customer lists and doubled its pipeline — even in a non-small business market. Succeeding with this method comes down to several factors.

For one thing, when you seek out and target the exact customers you exist to serve, you can focus your resources on leads that will result in more conversions. Focusing only on prospects with the potential to turn into paying customers is especially important when you’re in the startup phase, but it remains effective as your company grows.

Targeting your customers also helps reduce the conflict between sales and product engineering. According to Dave Kellogg, CEO of Host Analytics, post-sale “deficiencies” often occur because salespeople don’t have well-defined criteria for matching customers to products. When this happens, they’re likely to sell to customers who may not be a good fit for the product to begin with.

Avoiding this conflict is critical for scaling your business because these first customers will define your brand and attract your next set of customers. In fact, Edelman found that 84 percent of all B2B deals stem from referrals from existing customers.

Once you understand the benefits of picking your customers (and the risks of waiting for them to find you), you need to take action. Here are five strategies for picking your customers:

1. Qualify, qualify, qualify.

Think of customers as long-term investments. You want to know that a year from now, they’ll really need your product and you’ll still be adding value for them. This is especially true if your business is SaaS-based.

2. Analyze their network.

Don’t just determine what companies do and how you can sell to them. Look at their networks — their partners, competitors and customers — to determine their potential for referrals. Companies with large networks have the potential to present you with more business opportunities.

3. Focus on growing companies.

Theoretically, your product can help a losing company. But unless you’re a company that makes huge profits investing in sick businesses, focus on businesses with a future of growth because that’s your future, too. IBM and Oracle didn’t become successful by focusing on dying markets.

One trick for figuring out whether a company is growing is to go to the career section of its website. If the company is hiring at a higher rate than its peers, that’s a great sign.

4. Pick customers who close.

It’s great to aim high, but don’t waste time on unrealistic customers who will never convert. Some salespeople have prospect lists that are all Fortune 500 companies even though they’re still selling to startups. It’s good to have two or three aspirational accounts, but 80 percent of your list should be companies that will actually do business with you.

5. Don’t be fooled by engagement.

Unless you’re Facebook, engagement doesn’t equal revenue. Placing too much emphasis on engagement just wastes resources on people who find your product valuable, but not valuable enough to pay for it.

If you want to build a strong customer base that will continue to bring in revenue for years to come, be proactive about seeking out the customers you want. Some startups seem to think they’re the star quarterbacks who will naturally attract everyone, but in reality, they’re the wallflowers standing around awkwardly waiting to be noticed.

Don’t be the wallflower. Get out there, and find the people who want to dance with you.

– By Aman Naimat


Posted in Spiderbook

Spiderbook comes out of beta with 10,000 sales users

Get qualified leads with 10x more response rates. Watch this video demo of what you get from SpiderLeads!

Screen Shot 2015-02-12 at 7.18.48 PM

Screen Shot 2015-02-12 at 7.48.47 PM

Screen Shot 2015-02-12 at 7.48.47 PM

Posted in Spiderbook | Tagged

Spiderbook Redefines CRM, Creates 10x More Accurate Customer Relationship Predictor

Top-performing sales people spend a lot of time gathering information to get to know their prospects and their prospects’ businesses. They carry out background research – on Linkedin, Twitter, community forums, company websites, news articles and the list goes on – to understand the company, the department, and the people they hope to build a relationship with. Many use CRM (customer relationship management) tools to handle the routine tasks associated with the sales process.

Unfortunately, while CRM solutions are good for tracking the progress of a sale, they are inept when it comes to actually help close the deal. Even if a sales rep can adequately manage all of their tasks, there is still too much content for one person to digest and use. But, what if they had a system that automatically processed all of the deal-closing business intelligence and served it up in an easy-to-use interface?

Spiderbook, a start-up headquartered in San Francisco, was founded by Aman Naimat and Alan Fletcher to solve those problems. If the adoption rate for their service is any indication, all signs point to a rousing success.

Aman and his team of three fellow NLP developers built SpiderGraph, which uses AlchemyAPI’s Keyword ExtractionEntity Extraction and Language Detection REST APIs to forge business intelligence based on everything from the public-facing records like press releases, websites, blogs, PR and digital marketing content to private business profiles accessed through partnerships with data services providers.

“We go beyond traditional CRM by using natural language processing and named entity recognition to understand businesses,” Aman explains. “We are curious to know how they partner, details on acquisitions, the products they sell, branding, SEC listings and even the types of resources that they look for in job posts.”

Spiderbook’s story describes how the team at Spiderbook is seeking to change the way sales people “connect the dots among companies, people, partners, products and documents.”

– by AlchemyAPI

Originally posted in:

Posted in Uncategorized

How is Spiderbook Different? Introducing Customer Relationship Discovery (CRD)

Ugh. Do we really need yet another CRM system? Yes, we do, and here’s why: Spiderbook performs a different function. Lots of other modern CRM systems like RelateIQ, BaseCRM, and Clari, are focused on salesforce automation, providing a better interface to write up and track the deal, enter forecasts, and communicate with sales management. Their goal is to perform all these functions faster, better, cheaper, so sales managers are kept up to date with the latest information from the field while the salesperson is selling. They are the smarter versions of the early CRM systems.


Spiderbook does not address this aspect of CRM at all. We are not focused on the latest interface design gizmos or letting your sales managers and general management know what’s happening in the field with your deal. These internal sales processes are not our point. There are already enough such internally-facing tools.

Customer Relationship Discovery

Spiderbook’s fundamental insight is that CRM is not really helpful in SELLING, and wouldn’t be even with the best interfaces or perfect speech recognition or a mind-communicating device (no data-entry required there!) to magically tell the sales manager about the deal status, etc. Sure, clunky interfaces have been improved so that data-entry today is less painful for the salesperson. But even if you could eliminate data entry altogether, you would have just reduced the pain, not improved the deal itself.


Spiderbook approaches the problem from the salesperson’s perspective, laser-focused on creating new deals and increasing the company’s top-line. Spiderbook is built to help salespeople create new deals and close them faster by discovering information that gives them the competitive advantage to do so. Spiderbook crawls, reads, and understands billions of documents so salespeople don’t have to. It can help them connect dots they just cannot do in their heads. It aggregates data from the complete internet including SEC filings, job posts, blogs, tweets, press releases, websites, facebook posts, ebooks, etc and provides a personalized plan on closing a deal. Spiderbook is like a navigation system to the prospect. After understanding all the data, it provides step-by-step guidance into what priorities the prospect is focused on, who are the key decision makers, how and when to get introduced to them, and what conversations to have to help close that deal. But thats not all! Spiderbook also recommends new prospects, ones that need your product or service the most but more importantly where there is a good match, intent, and budget.


Spiderbook has a different view on CRM. It’s external facing vs. internal facing. It will integrate with internal facing sales execution tools necessary for better reporting, negotiations, and tracking within your company. Spiderbook is more than just a different take on CRM. We are beyond CRM. What we do could more aptly be called Customer Relationship Discovery (CRD). The CRMs look in; we look out–for you!

Posted in CRM, Spiderbook

Named Entity Recognition (NER) Problem – Not Yet Solved

Automated recognition of named entities, a fundamental problem in Computational Linguistics, can benefit various language processing applications e.g., question answering, information extraction, etc. For example, the ability to accurately identify entities (e.g., PERSON (PER), LOCATION (LOC), ORGANIZATION (ORG) and MISCELLANEOUS (MISC)) is a prerequisite for identifying semantic relations between these entities. Spiderbook is a startup whose objective is to automatically extract vital information about businesses, their employees, and all the interrelationships between them in order to help the salesperson sell. These interrelationships are of type competition, partnership, acquisition, etc. Spiderbook’s Natural Language Processing (NLP) team is perfecting its ability to effectively extract such relationships from a massive number of documents on the Web. Spiderbook’s NLP team depends on a NER system to identify all possible companies/organizations in text to acquire the above mentioned relationships between them.

Amazon VS Amazon

Amazon VS Amazon

Considering the importance of the problem of NER, several research efforts have been done in academics to derive a better solution to this problem [1, 2, 3]. Two notable systems that have been proposed by NLP researchers for the current problem are the Illinois NER and Stanford NER [1, 2]. Both of these systems are capable of identifying PER, LOC, ORG, and MISC entities in text. On the evaluation data of CoNLL03, the Illinois NER system has shown to perform with a 3.94% better F1 score than the Stanford NER. The Illinois NER achieved a 90.80% F1 score on the CoNLL03 data set, a collection of Reuter’s 1996 news articles. Both the Illinois and Stanford NER systems produce their outputs on a sequence of words contained in a sentence. For example, on the sequence of words of the sentence, “Barack Hussein Obama is the 44th and current President of the United States” both NER systems produce the following output:

<PER>Barack Hussein Obama</PER> is the 44th and current President of the <LOC>United States</LOC>.

In order to determine if a words wi is some entity or a part of an entity, both NER systems rely upon a number of linguistic features, e.g. the system’s outputs on the two preceding words, and if the words wi is capitalized or not. An interesting feature utilized by both systems is the non-local feature for each entity. This feature ensures that all words representing the same entity in a document should be tagged with the same output. For example, the words “United States”, “USA”, “US”, “States”, “the United States of America” should all result in the same output, i.e. LOC. For enhanced performance, Ratinov and Roth (2009) [1] incorporated external knowledge in the Illinois NER in the form of highly precise gazetteers with wide coverage. These gazetteers contain 1.5 million entities, with names of people, locations, and organizations. In addition, they also used a clustering algorithm to assign the same output to similar entities. For example, Apple Inc. and IBM are similar entities and should be provided with the same output (i.e., ORG).

Although Illinois NER performs with a very high F1 score on the CoNLL03 data set, it needs to perform well on webpages as well. Spiderbook’s system extracts business relationships between companies or organizations collected from web pages. Through error analysis of Spiderbook’s system we noticed that around 28% of the errors in the extracted relationships are due to erroneous outputs by the NER systems. Table 1 shows some examples of errors made by the Illinois and Stanford NER systems, which affect performance of Spiderbook’s system for relations extraction.

. Outputs of Illinois NER and Stanford NER

In example 1 above, there are two organizations, Proctor & Gamble and Terra Technology, in a supplier relationship with each other (i.e., Terra Technology supplies their tool to Proctor & Gamble). This relationship is missed because the NER systems fail to identify that Proctor & Gamble and Terra Technology are different companies. In fact, example 1 has a simple Subject-Verb-Object structure, which remains unnoticed by both NER systems because all the words in the sentence start with uppercase letters. Upon fixing the case of the verb adopts, both systems recognize that Proctor & Gamble and Terra Technologies are different companies (see example 2 of Table 1). These two examples reveal that a module to improve the case of words in a sentence is critically needed to improve the NER output. Another error made by NER systems is the assignment of the ORG label to the expression “Terra Technology’s Demand Forecasting Tool”. The apostrophe in example 2 is showing that Terra Technology is a company and the Demand Forecasting Tool is the product of this company. An NER system needs to know how apostrophes work. Example 3 of Table 1 depicts a similar problem where both NER systems fail to detect that the expression “v.” reveals competition between two entities and thus “Allstate Fire and Casualty Insurance Company” and “Indemnity” are different entities.

Although both NER systems use gazetteers to collect information about the names of entities, these systems apparently do not know that Telx in example 4 is a telecommunication company. Thus, Spiderbook needs to have a more encompassing gazetteer in order to acquire information about all companies.

In example 5, the company Omni Travels has a partnership relationship with companies Apple Vacations, NCCL and SKA-Arabia. But the Illinois NER output failed to detect relationships of Omni Travels with Apple Vacations and SKA-Arabia, because Apple Vacations and SKA-Arabia were not identified as companies/organizations. NER systems need to be smarter in order to detect that all three companies appear in a list structure, i.e., entity 1, entity 2 and entity 3 and thus should all be assigned the same output i.e., ORG.

Above analysis of the outputs of NER systems has revealed that NLP researchers need to pay attention to the directions of improving cases of words, understanding linguistic structure of sentences, and increasing size of gazetteers to improve performance in entities recognition. A better business relations extraction capability could be achieved by Spiderbook’s system if the above mentioned issues with NER systems are addressed.

Mehwish Riaz
NLP Scientist – Spiderbook

[1] L. Ratinov and D. Roth, Design Challenges and Misconceptions in Named Entity Recognition. CoNLL (2009)

[2] J. R. Finkel, T. Grenager, and C. Manning. 2005. Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling. ACL 2005

[3] R. K. Ando and T. Zhang. A High-Performance Semi-Supervised Learning Method for Text Chunking. ACL 2005. Continue reading »

Posted in Data | Tagged

Gathering The Most Meaningful Data For Your Company

KillerS tartups Logo

Gathering data for sales is all about creating a better relationship with your customers and increasing sales. You can establish trust and a personal connection by checking out their social profiles, understand their needs by looking at their job posts, keep up on the latest developments by subscribing to Google Alerts, and understand their business priorities and risks by skimming their SEC filings.

The biggest thing to remember, though, is not to fall into the “Big Data Black Hole,” where nothing escapes and data isn’t useful.

There is often more data available about your customers and their companies than a salesperson can (or should) look at. Data can be distilled in different ways, but everyone needs access to a minimal data set that includes:

  • Type of industry
  • Amount of revenue
  • Employee count
  • Location
  • Key management
  • Interactions
  • Network

Beyond this minimal set, the data you need really depends on the product you’re selling or the type of business relationship you want to establish. What’s more, the data you gather doesn’t have to be about leads; there’s a lot of other useful information out there. For example, where did your customers hear about you first? Which social media sites should you focus on?

Read more…

Posted in Data | Tagged

Welcome to the Most Innovative Business for Data Solutions

Spiderbook most innovative business data solution

We’re excited to announce that Spiderbook has just won the DataWeek + API World 2014 award for Most Innovative Business Data Solution! Spiderbook was selected by crowd vote, where thousands of DevNetwork community members voted on the top Data + API technologies of 2014.

“This year’s DataWeek + API World crowd vote was our most active awards voting yet, with over 100 nominated technologies! What makes many award recipients this year stand out is the number of IT / infrastructure tools that are now available to developers or executives “as-a-service”. This shows how revolutionary Infrastructure-as-a-Service will be” – Geoff Domoracki, Founder of DataWeek + API World

As DataWeek + API World awards recipients, our team will be attending and participating in this year’s conference & expo. We’re offering 50 free OPEN passes (for the Expo, Keynotes, and Open Talks) to our community, register here:

About DataWeek + API World 2014

DataWeek + API World 2014 Conference & Expo (Sept 13-17) is San Francisco’s largest Data + API conference of 2014 – where you can attend 100+ talks lead by executives and interact with 200+ new data & API technologies, DataWeek + API World includes speakers from Google, IBM, Linkedin, The Economist, ReadWrite, HP, Dun & Bradstreet, Leap Motion,,, and hundreds more covering topics across Big Data, Data Science-as-a-Service, API Design, Data Visualization, Connected Cars, and the Internet of Things.

Posted in Data