RCIS data
RCIS is a privately held company, setup in 2007 by a former search engine developer with 13 years of experience in search, index, data warehousing and performance optimization.
Microsoft certified and Sun SCA contributor for the Mysql project.
RCIS is basically a research company which is trying to crack the code of indexing the web while maintaining acceptable relevancy to a users query.
We have been doing research for about 3 years now and have tested a lot of techniques, assumptions, patents and algorithms.
Those 3 years of research have led to a great insight in how search engines like Google and Yahoo internally work.
A prototype is currently being built to test this knowledge.
March 2010: We have recently started with the adoptation of Hadoop in our engine core. Hadoop provides an infrastructure/framework for
creating distributed, multi-node applications. Task parallelism over nodes is required to analyze, reduce
and rank the huge amount of data collected by our (web-)crawlers. The Ylumi search engine project is still an experimental
prototype project, created by a former search engine developer and its main purpose is to provide insight into the specific problems
encountered in search engine development for the world wide web. The project might evolve to become a public search engine within a few years.
Juli 2010: We are pleased to announce that we have succesfully ported our central storage system to a hypertable distributed filesystem.
Hypertable is an open source distributed fileystem, identical to Google's bigtable which allows us to have a virtual unlimited amount of storage at our fingertips.
Also our service application layer has been redesigned, we redesigned our network architecture, and we tried some interesting new stuff.
Machines can now be plugged in to the network and are configured automatically using bootp. Additionally the machines are now able to
perform any task, anywhere in network. We think this little revolution is a big step in the right direction to ultimately achieve our goals.
August 2010: We are currently migrating our provider-hub nodes to Apache APR, a framework to build high performance TCP servers.
December 2010: Hypertable integration has been cancelled. We were not able to get it up and running decently under VMWARE because of driver incompatibilities.
Until further notice we will be using HBASE instead. We have abstracted classes so we are able to move back to hypertable in the future. HBASE is, for now, our distributed
database of choice.
January 2011: Our distributed crawler, based on hadoop, is nearly complete. Currently 6 machines in the network are assigned as slaves, 2 machines are master.
The distributed crawler's first job will be the indexing of the dutch websites which have a fairly high authority rank, ie sites that have been around
for quite a while. Our crawler is able to crawl and parse about 3000 urls per second. Our crawler as been given the name "YlumiBot 1.0". This first job
will make us learn a great deal about how the Dutch web is linked. This intelligence will be used to improve indexing, tagging, search and ranking for Ylumi SE.
"We (always) optimize for ultra-speed."
Status SE: Hypertable implementation complete, other work in progress.
Related sites:
Mercuriusgids
last updated: juli. 2010