Logotype Swami
Sunet logotype

SfinxBis

Searching for people in web and ldap catalogues is not easy. You have to know the correct spelling of the name. In English you can use Soundex or variants to index names in a standard way and thereby make it easier to search without knowing the exact spelling.

In 1992 two students at Uppsala universitet did an dissertation at basic higher education level regarding swedish name indexing. They called their result Sfinx ("Svensk fonetisk namnindexering") and it's based on the phonetic algoritm Phonix. Phonix is a Soundex variant.

The goal for SinxBis is to make an implementation of an extended version of the dissertation result available to all SWAMI members. The dissertation is not available on electronic media.

SfinxBis is designed for surnames but works rather well with given names. Every different surname (and given name) for a person shall be coded by itself. For example Jan-Erik Pettersson Olson gives four codes, one each for Jan, Erik, Pettersson and Olson. For effective use of SfinxBis you should save SfinxBis name encoding in your database or ldap server.

Files


SfinxBis algorithm specificationPDF
Reference implementation of SfinxBis in Java with algorithm specification in Swedish
Implementation of SfinxBis in C# (C-sharp) (C# ZIP-file, IIS Asp.Net Demo)
LDAP schema for SfinxBis
List of surnames for testing implementations of SfinxBis including errors as empty names (row 1 is empty) and numbers
Expected result when testing the testfile

If you want to contribute by adding implementations of SfinxBis in other programing languages (for example PHP, ColdFusion and C++) please contact Pål Axelsson at Uppsala universitet.

SHARE
Share |
Kontakta sidansvarig