python_note
  • Introduction
  • My Python
    • Anaconda
    • argparse
    • datetime
    • json
    • logging
    • numpy
    • open
    • openCC
    • pandas & csv
    • Socket & SocketServer
    • re
    • yaml
    • smtp
    • 物件操作
    • unittest
    • 線程
    • prettytable
    • IO
    • pycurl
    • sys
    • pickle
    • auto-python-to-exe
    • cython
    • nuitka
  • Crawler
    • Urllib & Requests
      • User-agent
      • Percent-Encoding
      • mail code
    • Selenium
    • TCP & UDP
    • 控制字符(control character)
  • Web Development
    • Flask
      • RESTful api
      • Template
      • blueprint
    • Django
      • 環境佈署(windows)
    • 檢查Port
    • Apache
    • 使用者行為
    • jQuery
    • 壓力測試
    • DataTable
    • Bootstrap
    • CSS
    • JavaScript
    • Chart.js
  • Deep Learning
    • Keras 設定
    • RNN
    • LSTM
  • Test
    • T-Test
  • 資料結構
    • Hash
    • 時間複雜度
  • NLP
    • N-gram
    • CKIP
    • 中文轉數字
    • CRF
    • Mutual Information
    • 模糊比對
  • Linebot
    • Heroku
    • 圖文選單
    • channel
  • Linux
    • 常用指令
    • shell script
    • sshfs
    • ssh
    • nodejs & npm
    • debug
  • GCP
    • app engine
    • ssh(gcp)
    • gsutil
    • brabrabra
    • Load Balancer
    • k8s
  • Database
    • mysql
    • elasticsearch
      • Query
      • Backup and Restore
      • elasticdump
      • es2csv
      • ELK
    • mongodb
      • install
      • authentication
      • pymongo
    • sql server
  • go
    • Swarm
  • Docker
    • Kitematic
    • Dockerfile
    • Swarm
  • Git
  • 其他
    • USB軟體保護
    • Windows效能監視器
  • Blockchain
Powered by GitBook
On this page

Was this helpful?

  1. NLP

CKIP

PreviousN-gramNext中文轉數字

Last updated 5 years ago

Was this helpful?

參考來源:

#python3
from CKIP_python import CKIP_client

#處理回傳結果,有時會黏在一起!?
def raw2ckip(inp):
    inp=inp.replace('\xa0','').replace('\u3000','') #先去除奇怪的空格
    sentences=inp.split('\n') #再以換行斷開

    all_term=[]
    all_pos =[]
    for sentence in sentences:
        if sentence!='' and sentence!='\n':
            result=CKIP_client.ckip_client(sentence)
            pat=re.compile(r'\([0-9,A-Z,a-z,_]+\)')
            if result==None:
                pass
            else:
                for tp in result[0].split(' '):
                    result_re=pat.findall(tp)
                    if result_re!=None and len(result_re)==1:
                        pos=result_re[0]
                        all_term.append(tp.replace(pos,''))
                        all_pos.append(pos.replace('(','').replace(')',''))
                    elif result_re!=None and len(result_re)>1:
                        for p in result_re:
                            new_term=tp.split(p)[0]
                            all_term.append(new_term)
                            all_pos.append(p.replace('(','').replace(')',''))
                            tp=tp.replace(new_term,'',1).replace(p,'',1)    
                    else:
                        print('not found pos :'+tp)            
    return all_term,all_pos
https://github.com/ldkrsi/ckip_python
https://github.com/ldkrsi/ckip_python