本节需求
对知识库的解析进行扩展,增加Git仓库解析。用户填写Git仓库地址和账密,即可拉取代码并上传到知识库。
功能实现
1. 工程结构
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
| ai-rag-knowledge/ ├── xfg-dev-tech-api/ # API 接口层(服务契约) │ ├── IRAGService.java # RAG 服务接口定义 │ └── IAiService.java # AI 服务接口定义 ├── xfg-dev-tech-app/ # 应用层(启动 + 基础设施装配) │ ├── Application.java # Spring Boot 启动类 │ ├── config/ │ │ ├── OllamaConfig.java │ │ ├── RedisClientConfig.java │ │ └── RedisClientConfigProperties.java │ └── test/ │ └── RagPipelineTest.java # 单元测试 └── xfg-dev-tech-trigger/ # 触发器层(对外接口) ├── OllamaController.java # Ollama HTTP控制器 └── RAGController.java # RAG HTTP控制器
|
2. 代码实现
在IRAGService.java中新增方法:
1
| Response<String> analyzeGitRepository(String repoUrl, String userName, String token) throws Exception;
|
这是一个POST接口,用于克隆Git仓库、解析其中的文件内容,并将解析后的文本存储到向量数据库中。
同时在RAGController.java中实现该方法,下面是该方法的实现以及解析:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98
| @RequestMapping(value = "analyze_git_repository", method = RequestMethod.POST) @Override public Response<String> analyzeGitRepository(@RequestParam String repoUrl, @RequestParam String userName, @RequestParam String token) throws Exception { String localPath = "./git-cloned-repo"; String repoProjectName = extractProjectName(repoUrl); log.info("克隆路径:{}", new File(localPath).getAbsolutePath());
FileUtils.deleteDirectory(new File(localPath));
Git git = null; try { git = Git.cloneRepository() .setURI(repoUrl) .setDirectory(new File(localPath)) .setCredentialsProvider(new UsernamePasswordCredentialsProvider(userName, token)) .call();
Files.walkFileTree(Paths.get(localPath), new SimpleFileVisitor<>() { @Nonnull @Override public FileVisitResult preVisitDirectory(Path dir, @Nonnull BasicFileAttributes attrs) throws IOException { if (".git".equals(dir.getFileName().toString())) { return FileVisitResult.SKIP_SUBTREE; } return FileVisitResult.CONTINUE; }
@Nonnull @Override public FileVisitResult visitFile(Path file, @Nonnull BasicFileAttributes attrs) throws IOException { if (file.toString().contains(".git")) { return FileVisitResult.CONTINUE; }
log.info("{} 遍历解析路径,上传知识库:{}", repoProjectName, file.getFileName()); try { TikaDocumentReader reader = new TikaDocumentReader(new PathResource(file)); List<Document> documents = reader.get(); List<Document> documentSplitterList = tokenTextSplitter.apply(documents);
documents.forEach(doc -> doc.getMetadata().put("knowledge", repoProjectName)); documentSplitterList.forEach(doc -> doc.getMetadata().put("knowledge", repoProjectName));
pgVectorStore.accept(documentSplitterList); } catch (Exception e) { log.error("遍历解析路径,上传知识库失败:{}, 错误: {}", file.getFileName(), e.getMessage()); }
return FileVisitResult.CONTINUE; }
@Nonnull @Override public FileVisitResult visitFileFailed(Path file, @Nonnull IOException exc) throws IOException { log.error("Failed to access file: {} - {}", file.toString(), exc.getMessage()); return FileVisitResult.CONTINUE; } });
RList<String> elements = redissonClient.getList("ragTag"); if (!elements.contains(repoProjectName)) { elements.add(repoProjectName); }
log.info("遍历解析路径,上传完成:{}", repoUrl);
return Response.<String>builder().code("0000").info("调用成功").build();
} finally { if (git != null) { git.close(); }
try { Thread.sleep(1000); FileUtils.deleteDirectory(new File(localPath)); log.info("成功清理临时目录: {}", localPath); } catch (Exception e) { log.warn("清理临时目录失败: {}, 错误: {}", localPath, e.getMessage()); } } }
private String extractProjectName(String repoUrl) { String[] parts = repoUrl.split("/"); String projectNameWithGit = parts[parts.length - 1]; return projectNameWithGit.replace(".git", ""); }
|
功能测试
使用Apifox进行接口测试:
这里我先用DBeaver连接postgreSQL查看数据库的记录:
可以看到我们git仓库的内容已经正确记录在数据库中
然后在前端选择知识标签,Deepseek-r1:1.5b正确地按照文档内容回答了问题,测试成功!