利用文本挖掘技术来找出网络中的“小鲜词”
发布时间:2021-01-17 22:15:30 所属栏目:大数据 来源:网络整理
导读:开始之前,先看一下从人人网中发现的90后用户爱用的词 是不是很好玩,哈哈。写这篇文章就是让你简单的自动的从文本中找出新的词,这样就知道现在的年轻人喜欢什么了(对于博主这种上了年纪的人来说,真的是很有用,呜呜) 项目结构 当然,text.dat和common.d
|
文本选择器,筛选出可能为新词的词汇 CnTextSelector.javapackage grid.text.selector;
import grid.common.TextUtils;
public class CnTextSelector extends CommonTextSelector {
public CnTextSelector(String document,int minSelectLen,int maxSelectLen) {
super(document,minSelectLen,maxSelectLen);
}
protected void adjustCurLen() {
while (pos < docLen && !TextUtils.isCnLetter(document.charAt(pos))) {
pos++;
}
for (int i = 0; i < maxSelectLen && pos + i < docLen; i++) {
if (!TextUtils.isCnLetter(document.charAt(pos + i))) {
curLen = i;
if (curLen < minSelectLen) {
pos++;
adjustCurLen();
}
return;
}
}
curLen = pos + maxSelectLen > docLen ? docLen - pos : maxSelectLen;
}
}
CommonTextSelector.javapackage grid.text.selector;
public class CommonTextSelector implements TextSelector {
protected String document;
protected int pos = 0;
protected int maxSelectLen = 5;
protected int minSelectLen = 2;
protected int curLen;
protected final int docLen;
public CommonTextSelector(String document,int maxSelectLen) {
this.document = document;
this.minSelectLen = minSelectLen;
this.maxSelectLen = maxSelectLen;
docLen = document.length();
adjustCurLen();
}
public void select() {
pos += ++curLen;
adjustCurLen();
}
protected void adjustCurLen() {
curLen = pos + maxSelectLen > docLen ? docLen - pos : maxSelectLen;
}
public String next() {
if (curLen < minSelectLen) {
pos++;
adjustCurLen();
}
if (pos + curLen <= docLen && curLen >= minSelectLen) {
return document.substring(pos,pos + curLen--);
} else {
curLen--;
// return document.substring(pos,docLen);
return "";
}
}
public boolean end() {
return curLen < minSelectLen && curLen + pos >= docLen - 1;
}
@Override
public int getCurPos() {
return pos;
}
}
TextSelector.javapackage grid.text.selector;
public interface TextSelector {
public boolean end();
public void select();
public String next();
public int getCurPos();
}
测试代码(编辑:清远站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |


