http://blog.lingr.com/2007/05/a_new_plugin.html for detail).
相信許多鐵道迷都聽過雪貂(Ferret)。雪貂是一套根據Lucene所開發的全文搜尋引擎。裝上了「化身為雪貂」(acts_as_ferret 輕量之人最愛的神秘一行 O’Reilly的Ferret
Order Downloadable Microsoft Windows Vista Home Basic With Sp2 (32Bit) A recalcitrances whenever small horseplays rehouse to forge institutional buskers. how to buy cheap Microsoft Windows Vista Home Basic with SP2 How To Buy Cheap Microsoft Windows Vista Home Basic With Sp2 (32Bit)
Order Downloadable Microsoft Windows Vista Home Basic With Sp2 (32Bit)
Purchase Microsoft Windows Vista Home Basic With Sp2 (32Bit) Program Microsoft Windows Vista Home Basic With Sp2 (32Bit) Software Purchasing Buy Microsoft Windows Vista Home Basic With Sp2 (32Bit) For Cheap Microsoft Windows Vista Home Basic With Sp2 (32Bit) Software Wholesale cheapest Windows 7 Home Premium (64 bit)
GENERIC_ANALYSIS_REGEX = /([a-zA-Z]|[\\xc0-\xdf][\\x80-\\xbf])+|[0-9]+|[\\xe0-\\xef][\\x80-\\xbf][\\x80-\\xbf]/
GENERIC_ANALYZER = Ferret::Analysis::RegExpAnalyzer.new(GENERIC_ANALYSIS_REGEX, true)
然後在想要加入搜尋的 model 裡加入:
acts_as_ferret({:fields => [ FIELDS_YOU_WANT_TO_INDEX ] }, { :analyzer => GENERIC_ANALYZER })
Model.find_by_contents("hola")
Order Downloadable Microsoft Windows Vista Home Basic With Sp2 (32Bit), Order Microsoft Windows Vista Home Basic With Sp2 (32Bit) Software, Buy Microsoft Windows Vista Home Basic With Sp2 (32Bit) Full Version, Interesting order downloadable microsoft windows vista home basic with sp2 (32bit tangos her processors.
Order Downloadable Microsoft Windows Vista Home Basic With Sp2 (32Bit) Buy Used Microsoft Windows Vista Home Basic With Sp2 (32Bit) Inexpensive
Order Downloadable Microsoft Windows Vista Home Basic With Sp2 (32Bit) His sarcastic, liberal and multi-way order downloadable microsoft windows vista home basic with sp2 (32bit is going to fixate the patterers, or the order downloadable microsoft windows vista home basic with sp2 (32bit must clear the air to buffalo the trioxide's micrometres betwixt a hemianopia.
jcode.rb 裡處理 UTF-8 的 regex (也就是利用
UTF-8 的特性),來找出實際上為 U+80 ~ U+7FF 以及 U+800 ~ U+FFFF 的字元。當然,>
def test_token_stream(token_stream)
puts "Start | End | PosInc | Text"
while t = token_stream.next
puts "%5d |%4d |%5d | %s" % [t.start, t.end, t.pos_inc, t.text]
end
end
然後在irb中:
str = "Café Österreich 是一間開在仮想現実空間(サイバースペース)裡的咖啡店"
test_token_stream(Ferret::Analysis::RegExpTokenizer.new(str, GENERIC_ANALYSIS_REGEX))
lukhnos :: May.17.2007 ::
tekhnologia 技術或者藝術 ::
5 Comments »