欢迎访问《现代矿业》杂志官方网站,今天是 分享到:
×

扫码分享

现代矿业 ›› 2022, Vol. 38 ›› Issue (11): 266-.

• 实用技术 • 上一篇    下一篇

基于数据仓库的论文推荐方法研究

汪贻杰1 沙梦钒2 赵鹏2 周建平1   

  1. 1. 安徽工业大学计算机科学与技术学院;2. 中钢集团马鞍山矿山研究总院股份有限公司
  • 出版日期:2022-11-25 发布日期:2023-05-18

Research on Paper Recommendation Method Based on Data Warehouse

WANG Yijie1 SHA Mengfan2 ZHAO Peng2 ZHOU Jianping1   

  1. 1. College of Computer Science and Technology, Anhui University of Technology;2. Sinosteel Maanshan General Institute of Mining Research Co., Ltd.
  • Online:2022-11-25 Published:2023-05-18

摘要: 为促进期刊融媒体发展,提高杂志社网站的知识服务能力,实现为读者提供论文在 线推荐服务,提出了一种基于数据仓库的论文推荐方法。首先建立数据仓库,设立论文推荐主题 库,抽取数据集中论文的标题、摘要、关键词等数据建立特征数据集;然后,再对这些特征数据集进 行预处理,得到半结构化的分词特征数据集,存入数据仓库的ODS 层,格式化和ETL 化原始数据 层中的数据,清洗维度缺项的数据,存入DWD 层,构造维度-论文权重矩阵,存储在DWS 层,ADS 层存储汇总的推荐结果;最后,推荐时根据ADS 层主题表中的分词特征数据集,计算待推荐论文 的相似度,根据相似度值对目标论文推荐相似文献。结果表明,该推荐方法提高了论文推荐的实 时性和准确率,应用效果良好。

关键词: 数据仓库, 文本相似度, Spark , 论文推荐

Abstract: In order to promote the development of journal convergence media, improve the knowledge service ability of magazine website and provide readers with online paper recommendation service, a paper recommendation method based on data warehouse is proposed. First, set up a data warehouse, and establish papers recommended theme library, data extraction paper title, abstract, keywords and other data to estab⁃ lish a feature data set; then, the semi-structured word segmentation feature data set is obtained by prepro⁃ cessing these feature data sets. The above data is stored in the ODS layer of the data warehouse, the datasin original data layer are formatted and ETL,the dimension missing items are cleaned,then the datas are stored in the DWD layer. The dimension-paper weight matrix is stored in the created DWS layer, and the summary recommendation data is stored in the ADS layer of the top application layer. Finally, the similarity of the paper to be recommended is calculated from the word segmentation feature data set in the ADS layer topic table, and the similar literature is recommended for the target paper according to the similarity value. The results show that the proposed method improves the real-time and accuracy of paper recommendation, and the application effect is good.

Key words: data warehouse, text similarity, Spark, paper recommendation