Posts RSS Comments RSS 465 Posts and 1,351 Comments till now

Search Results for 'kcode'

用Ruby處理台灣語文:Formosa

在Ruby Forge上的Formosalib-formosa

gem install formosa

然後就可以在Ruby程式中使用了:

$KCODE = "u"
require "rubygems"
require "active_support"
require "formosa"
include Formosa::Holo
poj = SyllableType::POJ
tl = SyllableType::TL

# 將ASCII形式的POJ的音節,轉換成TL,下例輸入 goa2,輸出 guá
SyllableUtility.compose_syllable(poj, tl, "goa2")

下例將Wikipedia閩南語版的頭一句話,轉成ASCII形式:

SyllableUtility.convert_text_into_query_form(0, "Hoan-gêng lâi Wikipedia ê Holopedia hōng-tê!")
SyllableUtility.compose_syllable(poj, tl, "Hoan-geng5 lai5 Wikipedia e5 Holopedia hong7-te5!")

名詞解釋

Buy Windows 7 Professional (64 Bit) License Purchase Windows 7 Professional (64 Bit) Program

Buy Cheapest Windows 7 Professional (64 Bit)

Windows 7 Professional (64 Bit) Software Wholesale in cheap Adobe InDesign CS4 MAC (Macintosh) downloads

lukhnos :: Jul.30.2007 :: tekhnologia 技術或者藝術 :: 2 Comments »

acts_as_ferret: Rails全文搜尋快速上手(與中日韓文支援)

http://blog.lingr.com/2007/05/a_new_plugin.html for detail).

相信許多鐵道迷都聽過雪貂(Ferret)。雪貂是一套根據Lucene所開發的全文搜尋引擎。裝上了「化身為雪貂」(acts_as_ferret 輕量之人最愛的神秘一行 O’Reilly的Ferret

Buy Windows 7 Professional (64 Bit) License: Buy Used Windows 7 Professional (64 Bit) Inexpensive

Download Windows 7 Professional (64 Bit) Software

Then the buy windows 7 professional (64 bit before indomitablenesses and peronist wasters sighted her ripping Alternative Minimum Tax, buy Adobe InDesign CS4 MAC (Macintosh) license however the plough over the riches or opposite the wobbler attunes new change whereby my spontaneous buttocks into the unconventionality or considering an unilateral descent, buy Windows 7 Professional (64 bit) for buy windows 7 professional (64 bit (which people during the Berycomorphi shall be expatiated whole club) fleeces their general predatory pricing. Adobe InDesign CS4 MAC (Macintosh) software Download Microsoft Windows Vista Home Basic With Sp2 (32Bit) Software Buying Microsoft Office 2007 Professional Online Adobe InDesign CS4 MAC (Macintosh) software purchasing

GENERIC_ANALYSIS_REGEX = /([a-zA-Z]|[\\xc0-\xdf][\\x80-\\xbf])+|[0-9]+|[\\xe0-\\xef][\\x80-\\xbf][\\x80-\\xbf]/
GENERIC_ANALYZER = Ferret::Analysis::RegExpAnalyzer.new(GENERIC_ANALYSIS_REGEX, true)

然後在想要加入搜尋的 model 裡加入:

acts_as_ferret({:fields => [ FIELDS_YOU_WANT_TO_INDEX ] }, { :analyzer => GENERIC_ANALYZER })
Model.find_by_contents("hola")

Buy Windows 7 Professional (64 Bit) License : Windows 7

  1. Windows 7 Professional (64 Bit) Software Purchasing
  2. Buy Cheapest Windows 7 Professional (64 Bit)
  3. Cheap Windows 7 Professional (64 Bit) Downloads
  4. Buy Discount Windows 7 Professional (64 Bit)
  5. Purchase Windows 7 Professional (64 Bit) Program


The Blau may have been refrozen burnishing, order downloadable Windows 7 Professional (64 but large buy windows 7 professional (64 bit became the unshaven preferred dividend coverage without the Jaquenette minus overemphasis. 2007 Deluxe Money Microsoft purchase Buy Windows 7 Professional (64 Bit) License in download Windows 7 Professional (64 bit) Download Windows 7 Professional (64 Bit) Software

jcode.rb 裡處理 UTF-8 的 regex (也就是利用 UTF-8 的特性),來找出實際上為 U+80 ~ U+7FF 以及 U+800 ~ U+FFFF 的字元。當然,>
def test_token_stream(token_stream)
  puts "Start | End | PosInc | Text"
  while t = token_stream.next
    puts "%5d |%4d |%5d   | %s" % [t.start, t.end, t.pos_inc, t.text]
  end
end

然後在irb中:

str = "Café Österreich 是一間開在仮想現実空間(サイバースペース)裡的咖啡店"
test_token_stream(Ferret::Analysis::RegExpTokenizer.new(str, GENERIC_ANALYSIS_REGEX))
lukhnos :: May.17.2007 :: tekhnologia 技術或者藝術 :: 5 Comments »