KM Space: Best Practices for Securing Enterprise Search

Overview: Although this session provides technical details it also offers a glimpse at the issues behind enterprise search for those less familiar with the subject. The challenge is to provide easy access to data and content employees need while still protecting sensitive information. Join us as we share best practices for delivering secure yet comprehensive results for leading search engines.

Speaker - Mark Bennett, Vice President, New Idea Engineering, Inc.
Speaker - Miles Kehoe, CEO, New Idea Engineering, Inc.

My Notes:

One of the common themes during the conference was the need to find the stuff you need. Of course the flip side is to make sure people do not find the stuff they should not be able to find.

You’ll be amazed what you can find on your own company’s network. Try searching for:

confidential
highly confidential
salaries
performance review
Obscenities
Racial and gender slurs

(The session had a little bit of a scare tactic. There are issues with exposing content inside the enterprise. Lots of companies have gotten complacent with security by obscurity. "I can't even find my own stuff inside the enterprise, how can someone else find my stuff?")

The Good: single sign on, LDAP directories to make security management easier

The Bad: Spidering for content means that the spider has to be a super-user that can see everything.

The Ugly: There are lots of whole in search technologies.

They focused on what is the right level of security. The macro level? The document level? The field level?

Early binding versus Late Binding.

With early binding the security is applied as the information comes into the system. Late binding applies the security after the search is made. FAST was doing some hybrid binding. Late binding is not as good. The security verification happens after the documents are retrieved.

Early Binding: IndexTime
1. I have document “http://corp.acme.com/sales/forcast.html”, what are the group IDs for it?

Early Binding: SearchTime
1. I have Session ID “14729834416”, which User is that for?
2. I have User “Jones”, which groups is he in?
3. Transform the list of Group IDs into a Native Query Filter

Late Binding:

No work needed at Index time
Would appear to be a simpler/better design
Late Binding: SearchTime
I have Session ID “14729834416”, can I access document “http://corp.acme.com/sales/forcast.html”, Yes or No?
(repeat for every match)

The problem with early binding is latency. If you change someone's access after the last index, they will have access to documents that they should not have. The hybrid is good to deal with this issue.

Their take on vendors:

FAST Search & Transfer

Supports Early and Late binding
Can use BOTH together
Hybrid approach “Best of both Worlds”
Gets along very well with Microsoft Active Directory
FAST SAM = Security Access Module
Based on Windows technology
Can still use your own application level logic if you prefer

Google Appliance

Late‐Binding only
“spin” is low latency –but actually a compromise...
Could heavily load security infrastructure
Does use some caching to lighten the load
Caching decreases response time = good
Caching increases latency (ACL changes)

Endeca

Out of the box is Early Binding only
Mitigated by low latency for document changes
Provides accurate document counts by user
General term is “Record Filters”
Or can use “joins” to a fulltext ACL index
RRN: Relational Record Navigation
Late binding via custom code

Microsoft Sharepoint

Late binding
Microsoft calls it result trimming

Search Structures

Monolithic search
With a monolithic search, the index pulls everything across company boundaries. End users also run their search in the one same system. The spider has to have a super-logon to crawl all of the systems.

Federated search
Different search engines are in place. The federator queries each of the underlying systems. Th federator passes through the users logon to run the search. Each search system runs its own way and its own way. The big problem is applying relevancy to the results from the federated search. You also have to deal with varying search syntaxes in the various underlying systems.

Deferred Search
For highly secured information, you provide a link to the different silo where you would need to re-run the search in that locked down system.

(They went into even more technical stuff that went way over my head. )

A link to their slidedeck on the Enterprise 2.0 Community Site.

KM Space

Thursday, June 12, 2008

Best Practices for Securing Enterprise Search

No comments:

Post a Comment

Recent Posts from Compliance Building

Blog Archive

Most Popular Posts

Pages

Topics

Disclaimer: