Thursday, June 12, 2008

Best Practices for Securing Enterprise Search

Overview: Although this session provides technical details it also offers a glimpse at the issues behind enterprise search for those less familiar with the subject. The challenge is to provide easy access to data and content employees need while still protecting sensitive information. Join us as we share best practices for delivering secure yet comprehensive results for leading search engines.
  • Speaker - Mark Bennett, Vice President, New Idea Engineering, Inc.
  • Speaker - Miles Kehoe, CEO, New Idea Engineering, Inc.
My Notes:

One of the common themes during the conference was the need to find the stuff you need. Of course the flip side is to make sure people do not find the stuff they should not be able to find.

You’ll be amazed what you can find on your own company’s network. Try searching for:
  • confidential
  • highly confidential
  • salaries
  • performance review
  • Obscenities
  • Racial and gender slurs
(The session had a little bit of a scare tactic. There are issues with exposing content inside the enterprise. Lots of companies have gotten complacent with security by obscurity. "I can't even find my own stuff inside the enterprise, how can someone else find my stuff?")

The Good: single sign on, LDAP directories to make security management easier

The Bad: Spidering for content means that the spider has to be a super-user that can see everything.

The Ugly: There are lots of whole in search technologies.

They focused on what is the right level of security. The macro level? The document level? The field level?

Early binding versus Late Binding.


With early binding the security is applied as the information comes into the system. Late binding applies the security after the search is made. FAST was doing some hybrid binding. Late binding is not as good. The security verification happens after the documents are retrieved.


Early Binding: IndexTime
1. I have document “http://corp.acme.com/sales/forcast.html”, what are the group IDs for it?

Early Binding: SearchTime
1. I have Session ID “14729834416”, which User is that for?
2. I have User “Jones”, which groups is he in?
3. Transform the list of Group IDs into a Native Query Filter


Late Binding:
  • No work needed at Index time
  • Would appear to be a simpler/better design
  • Late Binding: SearchTime
  • I have Session ID “14729834416”, can I access document “http://corp.acme.com/sales/forcast.html”, Yes or No?
  • (repeat for every match)

The problem with early binding is latency. If you change someone's access after the last index, they will have access to documents that they should not have. The hybrid is good to deal with this issue.

Their take on vendors:

FAST Search & Transfer
  • Supports Early and Late binding
  • Can use BOTH together
  • Hybrid approach “Best of both Worlds”
  • Gets along very well with Microsoft Active Directory
  • FAST SAM = Security Access Module
  • Based on Windows technology
  • Can still use your own application level logic if you prefer
Google Appliance
  • Late‐Binding only
  • “spin” is low latency –but actually a compromise...
  • Could heavily load security infrastructure
  • Does use some caching to lighten the load
  • Caching decreases response time = good
  • Caching increases latency (ACL changes)
Endeca
  • Out of the box is Early Binding only
  • Mitigated by low latency for document changes
  • Provides accurate document counts by user
  • General term is “Record Filters”
  • Or can use “joins” to a fulltext ACL index
  • RRN: Relational Record Navigation
  • Late binding via custom code
Microsoft Sharepoint
  • Late binding
  • Microsoft calls it result trimming
Search Structures

Monolithic search
With a monolithic search, the index pulls everything across company boundaries. End users also run their search in the one same system. The spider has to have a super-logon to crawl all of the systems.

Federated search
Different search engines are in place. The federator queries each of the underlying systems. Th federator passes through the users logon to run the search. Each search system runs its own way and its own way. The big problem is applying relevancy to the results from the federated search. You also have to deal with varying search syntaxes in the various underlying systems.

Deferred Search
For highly secured information, you provide a link to the different silo where you would need to re-run the search in that locked down system.

(They went into even more technical stuff that went way over my head. )

A link to their slidedeck on the Enterprise 2.0 Community Site.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.