修复标签与标题脱离原始文件名的问题
<p style="border: 0px solid; margin-top: 0px; margin-bottom: 1rem; padding: 0px; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);">核心问题是:文件名中的中文部分是后期不准确的直译,而英文部分是准确的原始描述。需要优先使用英文部分进行翻译和标签生成。</p><ul style="border: 0px solid; margin-top: 0.5rem; margin-bottom: 1rem; padding: 0px 0px 0px 1.5rem; list-style-position: outside; list-style-image: initial; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">英文部分是准确的原始描述,中文部分可能是后期不准确的直译,应优先使用英文部分</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">过滤无意义内容:天途影像、X 2 数量标识、错误直译词</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">单个中文数字(一、二、三等)应保留,不应被过滤</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">保持代码修改已提交到 git</li></ul><h2 style="border: 0px solid; margin-top: 2rem; margin-bottom: 0.75rem; padding: 0px; font-size: 14px; font-weight: 500; color: rgb(254, 253, 253); line-height: 21px; font-family: Inter, "Inter Fallback"; background-color: rgb(21, 21, 21);">Discoveries</h2><ol style="border: 0px solid; margin-top: 0.5rem; margin-bottom: 1rem; padding: 0px 0px 0px 2.25rem; list-style-position: outside; list-style-image: initial; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">问题根源</span>:<code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_extract_english_part</code> 方法从完整路径提取英文而非只提取文件名</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">多层过滤问题</span>:多个地方都有 <code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">len(word) >= 2</code> 的过滤条件,导致单个中文数字被过滤</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">需要修改的位置</span>:<ul style="border: 0px solid; margin-top: 0.25rem; margin-bottom: 0.25rem; padding: 0px 0px 0px 1rem; list-style: outside disc;"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_extract_semantic_words</code> 中的语义提取</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_filter_meaningless_tags</code> 中的标签过滤</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_generate_natural_language_filename</code> 中的文件名生成</li></ul></li></ol><h2 style="border: 0px solid; margin-top: 2rem; margin-bottom: 0.75rem; padding: 0px; font-size: 14px; font-weight: 500; color: rgb(254, 253, 253); line-height: 21px; font-family: Inter, "Inter Fallback"; background-color: rgb(21, 21, 21);">Accomplished</h2><ol style="border: 0px solid; margin-top: 0.5rem; margin-bottom: 1rem; padding: 0px 0px 0px 2.25rem; list-style-position: outside; list-style-image: initial; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">新增 <code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_extract_english_part()</code> 方法提取英文部分(已修复路径问题)</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">新增 <code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_optimize_filename_with_llm()</code> 方法使用 Qwen 模型优化</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">扩展翻译词库(鸟类、猫、狗、身体部位、数字等约150词)</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">扩展 <code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">label_to_tags</code> 映射表(约120词)</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">修复 <code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">NameError: quoted_content 未定义</code> 错误</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">添加缺失的翻译词:finger→手指, five→五, six→六 等</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;">修改多处过滤逻辑允许单个中文数字通过</li></ol><p style="border: 0px solid; margin-top: 0px; margin-bottom: 1rem; padding: 0px; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);"><span style="border: 0px solid; margin: 0px; padding: 0px;">正在修复</span>:单个中文数字在最终文件名生成时仍被过滤的问题</p><h2 style="border: 0px solid; margin-top: 2rem; margin-bottom: 0.75rem; padding: 0px; font-size: 14px; font-weight: 500; color: rgb(254, 253, 253); line-height: 21px; font-family: Inter, "Inter Fallback"; background-color: rgb(21, 21, 21);">Relevant files / directories</h2><ul style="border: 0px solid; margin-top: 0.5rem; margin-bottom: 1rem; padding: 0px 0px 0px 1.5rem; list-style-position: outside; list-style-image: initial; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><p style="border: 0px solid; margin-top: 0px; margin-bottom: 0px; padding: 0px; display: inline;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">python/classifier_v2.py</code> - 主分类器,包含文件名处理逻辑</p><ul style="border: 0px solid; margin-top: 0.25rem; margin-bottom: 0.25rem; padding: 0px 0px 0px 1rem; list-style: outside disc;"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_extract_english_part()</code> - 提取英文部分</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_extract_semantic_words()</code> - 提取语义词</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_generate_natural_language_filename()</code> - 生成新文件名</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_translate_filename_to_chinese()</code> - 翻译英文到中文</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_optimize_filename_with_llm()</code> - LLM优化文件名</li></ul></li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><p style="border: 0px solid; margin-top: 0px; margin-bottom: 0px; padding: 0px; display: inline;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">python/utils/tag_generator.py</code> - 标签生成工具</p><ul style="border: 0px solid; margin-top: 0.25rem; margin-bottom: 0.25rem; padding: 0px 0px 0px 1rem; list-style: outside disc;"><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">extract_filename_keywords()</code> - 提取关键词</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">_filter_meaningless_tags()</code> - 过滤无意义标签</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">MEANINGLESS_TAGS</code> - 无意义标签列表</li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">translation_map</code> - 翻译词库</li></ul></li><li style="border: 0px solid; margin: 0px 0px 0.5rem; padding: 0px;"><p style="border: 0px solid; margin-top: 0px; margin-bottom: 0px; padding: 0px; display: inline;"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em; color: rgb(0, 206, 185);">python/utils/local_llm_tags.py</code> - 本地LLM标签生成器</p></li></ul><div data-component="markdown-code" style="border: 0px solid; margin: 0px; padding: 0px; position: relative; color: rgb(254, 253, 253); font-family: Inter, "Inter Fallback"; font-size: 14px; background-color: rgb(21, 21, 21);"><pre class="shiki OpenCode" style="border-width: 0.666667px; border-style: solid; border-color: rgb(59, 58, 57); border-image: none 100% / 1 / 0 stretch; margin: 2rem 0px; padding: 8px 12px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 13px; scrollbar-width: none; overflow: auto; border-radius: 6px; color: rgb(241, 236, 232);"><code style="border: 0px solid; margin: 0px; padding: 0px; font-family: "IBM Plex Mono", "IBM Plex Mono Fallback", ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, "Liberation Mono", "Courier New", monospace; font-feature-settings: "ss01"; font-variation-settings: normal; font-size: 1em;"><span class="line" style="border: 0px solid; margin: 0px; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">File: Five finger whistles..07034113.wav</span></span>
<span class="line" style="border: 0px solid; margin: 0px; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">Semantic: ['五', '手指', '口哨'] ✅ 正确</span></span>
<span class="line" style="border: 0px solid; margin: 0px; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">Keywords: ['五', '手指', '口哨'] ✅ 正确</span></span>
<span class="line" style="border: 0px solid; margin: 0px; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">CN Tags: ['五', '手指', '口哨', ...] ✅ 正确</span></span>
<span class="line" style="border: 0px solid; margin: 0px; padding: 0px;"><span style="border: 0px solid; margin: 0px; padding: 0px;">New Name: 手指_口哨_2df943.wav ❌ 缺少 '五'</span></span></code></pre></div>
← 返回博客列表

评论
发表评论