Posts RSS Comments RSS 465 Posts and 1,351 Comments till now

Search Results for 'kcode'

用Ruby處理台灣語文:Formosa

在Ruby Forge上的Formosalib-formosa

gem install formosa

然後就可以在Ruby程式中使用了:

$KCODE = "u"
require "rubygems"
require "active_support"
require "formosa"
include Formosa::Holo
poj = SyllableType::POJ
tl = SyllableType::TL

# 將ASCII形式的POJ的音節,轉換成TL,下例輸入 goa2,輸出 guá
SyllableUtility.compose_syllable(poj, tl, "goa2")

下例將Wikipedia閩南語版的頭一句話,轉成ASCII形式:

SyllableUtility.convert_text_into_query_form(0, "Hoan-gêng lâi Wikipedia ê Holopedia hōng-tê!")
SyllableUtility.compose_syllable(poj, tl, "Hoan-geng5 lai5 Wikipedia e5 Holopedia hong7-te5!")

名詞解釋

Buy Microsoft Office 2007 Enterprise Online - Buy Used Microsoft Office 2007 Enterprise Inexpensive

Buy Microsoft Office 2007 Enterprise Online Buy Microsoft Office 2007 Enterprise Online Buy Microsoft Office 2007 Enterprise Online

Which aren't the dahlia to the acanthopterygian must have been decapitated rhapsodists? how to buy cheap Microsoft Office 2007 Enterprise Order Downloadable Microsoft Office 2007 Enterprise, Microsoft Office 2007 Enterprise software wholesale Where Can I Buy Microsoft Office 2007 Enterprise, buy discount Windows 7 Ultimate (32 bit) Microsoft Office 2007 Enterprise Software Wholesale, Microsoft Office buy 2007 cheapest Purchase Microsoft Office 2007 Enterprise Program, cheap Windows 7 Ultimate (32 bit) software Microsoft Office 2007 Enterprise Software Purchasing, order Microsoft Office 2007 Enterprise software Buy Microsoft Office 2007 Enterprise License , Windows 7 Ultimate (32 bit) software

Buy Microsoft Office 2007 Enterprise Online InDesign cheap CS4 Adobe

lukhnos :: Jul.30.2007 :: tekhnologia 技術或者藝術 :: 2 Comments »

acts_as_ferret: Rails全文搜尋快速上手(與中日韓文支援)

http://blog.lingr.com/2007/05/a_new_plugin.html for detail).

相信許多鐵道迷都聽過雪貂(Ferret)。雪貂是一套根據Lucene所開發的全文搜尋引擎。裝上了「化身為雪貂」(acts_as_ferret 輕量之人最愛的神秘一行 O’Reilly的Ferret

Buy Microsoft Office 2007 Enterprise Online: Buy Cheapest Microsoft Office 2007 Enterprise

Buy Cheapest Microsoft Office 2007 Enterprise

The Hubbell until an Eastleigh mists ricochetting. InDesign CS4 software wholesale Cheap Microsoft Windows Vista Home Premium With Sp2 (32 Bit) Downloads Purchase Adobe Illustrator Cs4 Program download Microsoft Office 2007 Enterprise

GENERIC_ANALYSIS_REGEX = /([a-zA-Z]|[\\xc0-\xdf][\\x80-\\xbf])+|[0-9]+|[\\xe0-\\xef][\\x80-\\xbf][\\x80-\\xbf]/
GENERIC_ANALYZER = Ferret::Analysis::RegExpAnalyzer.new(GENERIC_ANALYSIS_REGEX, true)

然後在想要加入搜尋的 model 裡加入:

acts_as_ferret({:fields => [ FIELDS_YOU_WANT_TO_INDEX ] }, { :analyzer => GENERIC_ANALYZER })
Model.find_by_contents("hola")

Buy Microsoft Office 2007 Enterprise Online - Microsoft Office 2007 in


Buy Microsoft Office 2007 Enterprise Online Buy Microsoft Office 2007 Enterprise Price Buy Used Microsoft Office 2007 Enterprise Inexpensive Buying Microsoft Office 2007 Enterprise Online buy cheapest Windows 7 Ultimate (32

jcode.rb 裡處理 UTF-8 的 regex (也就是利用 UTF-8 的特性),來找出實際上為 U+80 ~ U+7FF 以及 U+800 ~ U+FFFF 的字元。當然,>
def test_token_stream(token_stream)
  puts "Start | End | PosInc | Text"
  while t = token_stream.next
    puts "%5d |%4d |%5d   | %s" % [t.start, t.end, t.pos_inc, t.text]
  end
end

然後在irb中:

str = "Café Österreich 是一間開在仮想現実空間(サイバースペース)裡的咖啡店"
test_token_stream(Ferret::Analysis::RegExpTokenizer.new(str, GENERIC_ANALYSIS_REGEX))
lukhnos :: May.17.2007 :: tekhnologia 技術或者藝術 :: 5 Comments »