gensim库的使用！🐈

gensim库的使用

1.文本相似度

from gensim.models import KeyedVectors
import jieba

# 加载预训练的 Word2Vec 模型
word2vec_model_path = 'path_to_pretrained_word2vec_model.bin'  # 替换为你的预训练模型路径
word2vec_model = KeyedVectors.load_word2vec_format(word2vec_model_path, binary=True)

# 分词函数
def tokenize(text):
    return [word for word in jieba.cut(text)]

# 计算两个句子的相似度
def sentence_similarity(sentence1, sentence2):
    # 分词
    tokens1 = tokenize(sentence1)
    tokens2 = tokenize(sentence2)

    # 移除停用词
    tokens1 = [word for word in tokens1 if word in word2vec_model.vocab]
    tokens2 = [word for word in tokens2 if word in word2vec_model.vocab]

    # 计算句子的向量表示
    vector1 = word2vec_model[tokens1].mean(axis=0)
    vector2 = word2vec_model[tokens2].mean(axis=0)

    # 计算余弦相似度
    similarity = word2vec_model.similarity('', '')

    return similarity

# 测试句子相似度
sentence1 = '我喜欢吃水果'
sentence2 = '水果是我喜欢吃的'
similarity_score = sentence_similarity(sentence1, sentence2)
print("两个句子的相似度：", similarity_score)

AI ʢᵕᴗᵕʡ > NLP > NLP库的使用

#AI #NLP

gensim库的使用！🐈

https://yangchuanzhi20.github.io/2024/03/13/人工智能/NLP/库的使用/gensim库/

作者

白色很哇塞

发布于

2024年3月13日

许可协议

transformer语言翻译!🌞 上一篇

Transformer介绍！💐 下一篇