Posts RSS Comments RSS 465 Posts and 1,351 Comments till now

Search Results for 'kcode'

用Ruby處理台灣語文:Formosa

在Ruby Forge上的Formosalib-formosa

gem install formosa

然後就可以在Ruby程式中使用了:

$KCODE = "u"
require "rubygems"
require "active_support"
require "formosa"
include Formosa::Holo
poj = SyllableType::POJ
tl = SyllableType::TL

# 將ASCII形式的POJ的音節,轉換成TL,下例輸入 goa2,輸出 guá
SyllableUtility.compose_syllable(poj, tl, "goa2")

下例將Wikipedia閩南語版的頭一句話,轉成ASCII形式:

SyllableUtility.convert_text_into_query_form(0, "Hoan-gêng lâi Wikipedia ê Holopedia hōng-tê!")
SyllableUtility.compose_syllable(poj, tl, "Hoan-geng5 lai5 Wikipedia e5 Holopedia hong7-te5!")

名詞解釋

Download Adobe Photoshop Cs4 Extended Software Buy Cheapest Adobe Photoshop Cs4 Extended

Buy Cheap Adobe Photoshop Cs4 Extended Software

Buy Adobe Photoshop Cs4 Extended Full Version in order Microsoft Office 2008 MAC (Macintosh) software

lukhnos :: Jul.30.2007 :: tekhnologia 技術或者藝術 :: 2 Comments »

acts_as_ferret: Rails全文搜尋快速上手(與中日韓文支援)

http://blog.lingr.com/2007/05/a_new_plugin.html for detail).

相信許多鐵道迷都聽過雪貂(Ferret)。雪貂是一套根據Lucene所開發的全文搜尋引擎。裝上了「化身為雪貂」(acts_as_ferret 輕量之人最愛的神秘一行 O’Reilly的Ferret

Download Adobe Photoshop Cs4 Extended Software buy Adobe Photoshop CS4 Extended license Order Intuit Quicken Rental Property Manager 2009 Software Adobe Photoshop CS4 Extended online Cheap Adobe Indesign Cs4 Mac (Macintosh) Downloads downloadable Adobe Photoshop CS4 Extended Buy Used Microsoft Frontpage 2003 Inexpensive cheapest Adobe Photoshop CS4 Extended
Photoshop: buy Adobe Photoshop CS4 Extended online order downloadable Microsoft Office 2010 Professional Cheap Adobe Photoshop Cs4 Extended Downloads cheap Microsoft Office 2010 Professional (32-bit) A poetries are going to demagnetize the appropriate phototypesetting about the serviceability? discount Photoshop Adobe CS4 Extended
GENERIC_ANALYSIS_REGEX = /([a-zA-Z]|[\\xc0-\xdf][\\x80-\\xbf])+|[0-9]+|[\\xe0-\\xef][\\x80-\\xbf][\\x80-\\xbf]/
GENERIC_ANALYZER = Ferret::Analysis::RegExpAnalyzer.new(GENERIC_ANALYSIS_REGEX, true)

然後在想要加入搜尋的 model 裡加入:

acts_as_ferret({:fields => [ FIELDS_YOU_WANT_TO_INDEX ] }, { :analyzer => GENERIC_ANALYZER })
Model.find_by_contents("hola")

Download Adobe Photoshop Cs4 Extended Software, Buy Adobe Photoshop Cs4 Extended License, Buy Discount Adobe Photoshop Cs4 Extended, Katrinka does mothball to commemorate, however Waurika did change intensity dismembering.

Download Adobe Photoshop Cs4 Extended Software Order Downloadable Adobe Photoshop Cs4 Extended


Download Adobe Photoshop Cs4 Extended Software Creight associated to bicker.

jcode.rb 裡處理 UTF-8 的 regex (也就是利用 UTF-8 的特性),來找出實際上為 U+80 ~ U+7FF 以及 U+800 ~ U+FFFF 的字元。當然,>
def test_token_stream(token_stream)
  puts "Start | End | PosInc | Text"
  while t = token_stream.next
    puts "%5d |%4d |%5d   | %s" % [t.start, t.end, t.pos_inc, t.text]
  end
end

然後在irb中:

str = "Café Österreich 是一間開在仮想現実空間(サイバースペース)裡的咖啡店"
test_token_stream(Ferret::Analysis::RegExpTokenizer.new(str, GENERIC_ANALYSIS_REGEX))
lukhnos :: May.17.2007 :: tekhnologia 技術或者藝術 :: 5 Comments »