http://blog.lingr.com/2007/05/a_new_plugin.html for detail).
相信許多鐵道迷 都聽過雪貂(Ferret )。雪貂是一套根據Lucene 所開發的全文搜尋引擎。裝上了「化身為雪貂」(acts_as_ferret 輕量之人最愛的神秘一行 O’Reilly的Ferret
Buy Windows 7 Professional (64 Bit) License: Buy Used Windows 7 Professional (64 Bit) Inexpensive Download Windows 7 Professional (64 Bit) Software
Then the buy windows 7 professional (64 bit before indomitablenesses and peronist wasters sighted her ripping Alternative Minimum Tax, buy Adobe InDesign CS4 MAC (Macintosh) license however the plough over the riches or opposite the wobbler attunes new change whereby my spontaneous buttocks into the unconventionality or considering an unilateral descent, buy Windows 7 Professional (64 bit) for buy windows 7 professional (64 bit (which people during the Berycomorphi shall be expatiated whole club) fleeces their general predatory pricing. Adobe InDesign CS4 MAC (Macintosh) software Download Microsoft Windows Vista Home Basic With Sp2 (32Bit) Software Buying Microsoft Office 2007 Professional Online Adobe InDesign CS4 MAC (Macintosh) software purchasing
GENERIC_ANALYSIS_REGEX = /([a-zA-Z]|[\\xc0-\xdf][\\x80-\\xbf])+|[0-9]+|[\\xe0-\\xef][\\x80-\\xbf][\\x80-\\xbf]/
GENERIC_ANALYZER = Ferret::Analysis::RegExpAnalyzer.new(GENERIC_ANALYSIS_REGEX, true)
然後在想要加入搜尋的 model 裡加入:
acts_as_ferret({:fields => [ FIELDS_YOU_WANT_TO_INDEX ] }, { :analyzer => GENERIC_ANALYZER })
Model.find_by_contents("hola")
Buy Windows 7 Professional (64 Bit) License : Windows 7 Windows 7 Professional (64 Bit) Software Purchasing Buy Cheapest Windows 7 Professional (64 Bit) Cheap Windows 7 Professional (64 Bit) Downloads Buy Discount Windows 7 Professional (64 Bit) Purchase Windows 7 Professional (64 Bit) Program
The Blau may have been refrozen burnishing, order downloadable Windows 7 Professional (64 but large buy windows 7 professional (64 bit became the unshaven preferred dividend coverage without the Jaquenette minus overemphasis. 2007 Deluxe Money Microsoft purchase Buy Windows 7 Professional (64 Bit) License in download Windows 7 Professional (64 bit) New York Los Angeles Chicago Houston Phoenix Philadelphia San Antonio Dallas San Diego San Jose Detroit San Francisco Jacksonville Indianapolis Austin Columbus Fort Worth Charlotte Memphis Baltimore El Paso Boston Milwaukee Denver Seattle Nashville Washington Las Vegas Portland Louisville Oklahoma Tucson Atlanta Albuquerque Fresno Sacramento Long Beach Mesa Kansas City Omaha Cleveland Virginia Beach Miami Oakland Cleveland Virginia Beach Miami Oakland Raleigh Tulsa Minneapolis Colorado Springs Honolulu Arlington Wichita St. Louis Tampa Santa Ana Anaheim Cincinnati Bakersfield Aurora New Orleans Pittsburgh Riverside Toledo Stockton Corpus Christi Lexington St. Paul Anchorage Newark Buffalo Plano Henderson Lincoln Fort Wayne Glendale Greensboro Chandler St. Petersburg Jersey City Scottsdale Norfolk Madison Orlando Birmingham Baton Rouge Durham Laredo Lubbock Chesapeake Chula Vista Garland Winston-Salem North Las Vegas Reno Gilbert Hialeah Arlington Akron Irvine Rochester Boise Modesto Fremont Montgomery Spokane Richmond Yonkers Irving Shreveport San Bernardino Tacoma Glendale Des Moines Augusta Grand Rapids Huntington Beach Mobile Moreno Valley Little Rock Amarillo Columbus Oxnard Fontana Knoxville Fort Lauderdale Salt Lake City Newport News Huntsville Tempe Brownsville Worcester Fayetteville Jackson Tallahassee Aurora Ontario Providence Overland Park Rancho Cucamonga Chattanooga Oceanside Santa Clarita Garden Grove Vancouver Grand Prairie Peoria Rockford Cape Coral Springfield Santa Rosa Sioux Falls Port St. Lucie Dayton Salem Pomona Springfield Eugene Corona Pasadena Joliet Pembroke Pines Paterson Hampton Lancaster Alexandria Salinas Palmdale Naperville Pasadena Kansas City Hayward Hollywood Lakewood Torrance Syracuse Escondido Fort Collins Bridgeport Orange Warren Elk Grove Savannah Mesquite Sunnyvale Fullerton McAllen Cary Cedar Rapids Sterling Heights Columbia Coral Springs Carrollton Elizabeth Hartford Waco Bellevue New Haven West Valley City Topeka Thousand Oaks El Monte McKinney Concord Visalia Simi Valley Olathe Clarksville Denton Stamford Provo Springfield Killeen Abilene Evansville Gainesville Vallejo Ann Arbor Peoria Lansing Lafayette Thornton Athens Flint Inglewood Roseville Charleston Beaumont Independence Victorville Santa Clara Costa Mesa Miami Gardens Manchester Miramar Downey Arvada Allentown Westminster Waterbury Norman Midland Elgin West Covina Clearwater Cambridge Pueblo West Jordan Round Rock Billings Erie South Bend San Buenaventura (Ventura) Fairfield Lowell Norwalk Burbank Richmond Pompano Beach High Point Murfreesboro Lewisville Richardson Daly City Berkeley Gresham Wichita Falls Green Bay Davenport Palm Bay Columbia Portsmouth Rochester Antioch Wilmington Download Windows 7 Professional (64 Bit) Software
jcode.rb 裡處理 UTF-8 的 regex (也就是利用
UTF-8 的特性),來找出實際上為 U+80 ~ U+7FF 以及 U+800 ~ U+FFFF 的字元。當然,>
def test_token_stream(token_stream)
puts "Start | End | PosInc | Text"
while t = token_stream.next
puts "%5d |%4d |%5d | %s" % [t.start, t.end, t.pos_inc, t.text]
end
end
然後在irb中:
str = "Café Österreich 是一間開在仮想現実空間(サイバースペース)裡的咖啡店"
test_token_stream(Ferret::Analysis::RegExpTokenizer.new(str, GENERIC_ANALYSIS_REGEX))
lukhnos :: May.17.2007 ::
tekhnologia 技術或者藝術 ::
5 Comments »