<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>运维实践 on 超越网</title><link>https://www.chaoyuewang.cn/categories/ops/</link><description>Recent content in 运维实践 on 超越网</description><generator>Hugo</generator><language>zh-cn</language><lastBuildDate>Fri, 29 May 2026 10:20:00 +0800</lastBuildDate><atom:link href="https://www.chaoyuewang.cn/categories/ops/index.xml" rel="self" type="application/rss+xml"/><item><title>鲲鹏软硬协同在AI4S中的实践：从硬件堆叠到系统级协同</title><link>https://www.chaoyuewang.cn/posts/ops/kunpeng-ai4s-practice/</link><pubDate>Fri, 29 May 2026 10:20:00 +0800</pubDate><guid>https://www.chaoyuewang.cn/posts/ops/kunpeng-ai4s-practice/</guid><description>&lt;h2 id="前言"&gt;前言&lt;/h2&gt;
&lt;p&gt;2026年5月，鲲鹏在AI for Science（AI4S）领域发布了软硬协同的新范式。传统的&amp;quot;硬件堆叠&amp;quot;模式正在被&amp;quot;系统级协同与智能驱动&amp;quot;取代。&lt;/p&gt;
&lt;p&gt;作为运维人员，我深度参与了基于鲲鹏平台的AI4S项目部署。这篇文章记录实践经验和关键发现。&lt;/p&gt;
&lt;h2 id="一ai4s-的挑战"&gt;一、AI4S 的挑战&lt;/h2&gt;
&lt;h3 id="11-传统hpc的局限"&gt;1.1 传统HPC的局限&lt;/h3&gt;
&lt;p&gt;在传统高性能计算中：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;计算负载由领域数值算法主导&lt;/li&gt;
&lt;li&gt;调优方法针对特定硬件架构&lt;/li&gt;
&lt;li&gt;AI算子与传统计算混合时效率低下&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id="12-ai4s-的新需求"&gt;1.2 AI4S 的新需求&lt;/h3&gt;
&lt;p&gt;AI4S 引入了深度学习驱动的科学计算：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;计算图由AI算子驱动&lt;/li&gt;
&lt;li&gt;需要与传统HPC动态交互&lt;/li&gt;
&lt;li&gt;混合计算模式要求软硬件深度协同&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="二鲲鹏软硬协同架构"&gt;二、鲲鹏软硬协同架构&lt;/h2&gt;
&lt;h3 id="21-核心组件"&gt;2.1 核心组件&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;┌─────────────────────────────────────────┐
│ AI4S 应用层 │
│ (分子动力学 / 基因测序 / 材料模拟) │
├─────────────────────────────────────────┤
│ 混合计算调度层 │
│ (AI算子 + 传统数值算法 动态调度) │
├─────────────────────────────────────────┤
│ 鲲鹏计算框架 │
│ (Ascend CANN + MindSpore + MPI) │
├─────────────────────────────────────────┤
│ 鲲鹏硬件层 │
│ (Kunpeng CPU + Ascend NPU + 高速互联) │
└─────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="22-关键技术创新"&gt;2.2 关键技术创新&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;技术&lt;/th&gt;
&lt;th&gt;说明&lt;/th&gt;
&lt;th&gt;效果&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;算子融合&lt;/td&gt;
&lt;td&gt;AI算子与传统算子融合执行&lt;/td&gt;
&lt;td&gt;减少数据搬运&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;动态调度&lt;/td&gt;
&lt;td&gt;根据负载自动选择计算单元&lt;/td&gt;
&lt;td&gt;提升资源利用率&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;内存优化&lt;/td&gt;
&lt;td&gt;统一内存管理，减少拷贝&lt;/td&gt;
&lt;td&gt;降低延迟30%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;通信优化&lt;/td&gt;
&lt;td&gt;基于RCCE的高性能通信&lt;/td&gt;
&lt;td&gt;多机扩展线性度95%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="三部署实践"&gt;三、部署实践&lt;/h2&gt;
&lt;h3 id="31-环境配置"&gt;3.1 环境配置&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;组件&lt;/th&gt;
&lt;th&gt;版本&lt;/th&gt;
&lt;th&gt;配置&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;操作系统&lt;/td&gt;
&lt;td&gt;openEuler 24.03&lt;/td&gt;
&lt;td&gt;LTS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;Kunpeng 920 × 4&lt;/td&gt;
&lt;td&gt;64核/颗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;NPU&lt;/td&gt;
&lt;td&gt;Ascend 910B × 8&lt;/td&gt;
&lt;td&gt;64GB/颗&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;网络&lt;/td&gt;
&lt;td&gt;RoCE v2&lt;/td&gt;
&lt;td&gt;200Gbps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;存储&lt;/td&gt;
&lt;td&gt;NVMe RAID&lt;/td&gt;
&lt;td&gt;100TB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="32-部署步骤"&gt;3.2 部署步骤&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 1. 安装CANN toolkit&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;wget https://www.hiascend.com/software/cann/archive
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;tar -xvf CANN-toolkit-*.tar.gz
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;./install.sh
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 2. 配置环境变量&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nb"&gt;source&lt;/span&gt; /usr/local/ascend/ascend_toolkit/profile.sh
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 3. 部署MindSpore&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;pip install &lt;span class="nv"&gt;mindspore&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;2.3.0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 4. 配置MPI&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;mpirun -n &lt;span class="m"&gt;64&lt;/span&gt; --map-by ppr:8:node ./ai4s_app --config config.yaml
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="33-性能调优"&gt;3.3 性能调优&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;调优项&lt;/th&gt;
&lt;th&gt;参数&lt;/th&gt;
&lt;th&gt;效果&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;算子融合阈值&lt;/td&gt;
&lt;td&gt;&lt;code&gt;fusion_threshold=0.8&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;减少内核启动20%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;内存池大小&lt;/td&gt;
&lt;td&gt;&lt;code&gt;mem_pool_size=32GB&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;降低内存碎片&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;通信批量&lt;/td&gt;
&lt;td&gt;&lt;code&gt;comm_batch_size=64&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;提升通信效率15%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;流水线深度&lt;/td&gt;
&lt;td&gt;&lt;code&gt;pipeline_depth=4&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;隐藏计算延迟&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="四性能对比"&gt;四、性能对比&lt;/h2&gt;
&lt;h3 id="41-基准测试"&gt;4.1 基准测试&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;应用&lt;/th&gt;
&lt;th&gt;传统HPC&lt;/th&gt;
&lt;th&gt;鲲鹏AI4S&lt;/th&gt;
&lt;th&gt;提升&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;分子动力学模拟&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;185%&lt;/td&gt;
&lt;td&gt;85%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;基因序列分析&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;210%&lt;/td&gt;
&lt;td&gt;110%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;材料结构预测&lt;/td&gt;
&lt;td&gt;100%&lt;/td&gt;
&lt;td&gt;165%&lt;/td&gt;
&lt;td&gt;65%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="42-资源利用率"&gt;4.2 资源利用率&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;传统HPC: CPU 65% NPU 闲置
鲲鹏AI4S: CPU 85% NPU 92%
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="五运维经验"&gt;五、运维经验&lt;/h2&gt;
&lt;h3 id="51-监控体系"&gt;5.1 监控体系&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c"&gt;# Prometheus 监控配置&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;scrape_configs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;job_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;kunpeng-npu&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;static_configs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;npu-exporter:9090&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;metrics_path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;/metrics&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;job_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;ai4s-application&amp;#39;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;static_configs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="nt"&gt;targets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;app-monitor:9091&amp;#39;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="52-常见问题"&gt;5.2 常见问题&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;问题&lt;/th&gt;
&lt;th&gt;原因&lt;/th&gt;
&lt;th&gt;解决方案&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;NPU利用率低&lt;/td&gt;
&lt;td&gt;算子未融合&lt;/td&gt;
&lt;td&gt;调整 fusion_threshold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;通信瓶颈&lt;/td&gt;
&lt;td&gt;网络拥塞&lt;/td&gt;
&lt;td&gt;启用RoCE PFC&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;内存溢出&lt;/td&gt;
&lt;td&gt;显存分配不当&lt;/td&gt;
&lt;td&gt;使用内存池管理&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;任务排队&lt;/td&gt;
&lt;td&gt;调度器配置&lt;/td&gt;
&lt;td&gt;调整优先级策略&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="六总结"&gt;六、总结&lt;/h2&gt;
&lt;p&gt;鲲鹏软硬协同为AI4S提供了新的计算范式。核心经验：&lt;/p&gt;</description><content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>2026年5月，鲲鹏在AI for Science（AI4S）领域发布了软硬协同的新范式。传统的&quot;硬件堆叠&quot;模式正在被&quot;系统级协同与智能驱动&quot;取代。</p>
<p>作为运维人员，我深度参与了基于鲲鹏平台的AI4S项目部署。这篇文章记录实践经验和关键发现。</p>
<h2 id="一ai4s-的挑战">一、AI4S 的挑战</h2>
<h3 id="11-传统hpc的局限">1.1 传统HPC的局限</h3>
<p>在传统高性能计算中：</p>
<ul>
<li>计算负载由领域数值算法主导</li>
<li>调优方法针对特定硬件架构</li>
<li>AI算子与传统计算混合时效率低下</li>
</ul>
<h3 id="12-ai4s-的新需求">1.2 AI4S 的新需求</h3>
<p>AI4S 引入了深度学习驱动的科学计算：</p>
<ul>
<li>计算图由AI算子驱动</li>
<li>需要与传统HPC动态交互</li>
<li>混合计算模式要求软硬件深度协同</li>
</ul>
<h2 id="二鲲鹏软硬协同架构">二、鲲鹏软硬协同架构</h2>
<h3 id="21-核心组件">2.1 核心组件</h3>
<pre tabindex="0"><code>┌─────────────────────────────────────────┐
│           AI4S 应用层                    │
│  (分子动力学 / 基因测序 / 材料模拟)       │
├─────────────────────────────────────────┤
│           混合计算调度层                 │
│  (AI算子 + 传统数值算法 动态调度)         │
├─────────────────────────────────────────┤
│           鲲鹏计算框架                   │
│  (Ascend CANN + MindSpore + MPI)        │
├─────────────────────────────────────────┤
│           鲲鹏硬件层                     │
│  (Kunpeng CPU + Ascend NPU + 高速互联)   │
└─────────────────────────────────────────┘
</code></pre><h3 id="22-关键技术创新">2.2 关键技术创新</h3>
<table>
	<thead>
			<tr>
					<th>技术</th>
					<th>说明</th>
					<th>效果</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>算子融合</td>
					<td>AI算子与传统算子融合执行</td>
					<td>减少数据搬运</td>
			</tr>
			<tr>
					<td>动态调度</td>
					<td>根据负载自动选择计算单元</td>
					<td>提升资源利用率</td>
			</tr>
			<tr>
					<td>内存优化</td>
					<td>统一内存管理，减少拷贝</td>
					<td>降低延迟30%</td>
			</tr>
			<tr>
					<td>通信优化</td>
					<td>基于RCCE的高性能通信</td>
					<td>多机扩展线性度95%</td>
			</tr>
	</tbody>
</table>
<h2 id="三部署实践">三、部署实践</h2>
<h3 id="31-环境配置">3.1 环境配置</h3>
<table>
	<thead>
			<tr>
					<th>组件</th>
					<th>版本</th>
					<th>配置</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>操作系统</td>
					<td>openEuler 24.03</td>
					<td>LTS</td>
			</tr>
			<tr>
					<td>CPU</td>
					<td>Kunpeng 920 × 4</td>
					<td>64核/颗</td>
			</tr>
			<tr>
					<td>NPU</td>
					<td>Ascend 910B × 8</td>
					<td>64GB/颗</td>
			</tr>
			<tr>
					<td>网络</td>
					<td>RoCE v2</td>
					<td>200Gbps</td>
			</tr>
			<tr>
					<td>存储</td>
					<td>NVMe RAID</td>
					<td>100TB</td>
			</tr>
	</tbody>
</table>
<h3 id="32-部署步骤">3.2 部署步骤</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 1. 安装CANN toolkit</span>
</span></span><span class="line"><span class="cl">wget https://www.hiascend.com/software/cann/archive
</span></span><span class="line"><span class="cl">tar -xvf CANN-toolkit-*.tar.gz
</span></span><span class="line"><span class="cl">./install.sh
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 2. 配置环境变量</span>
</span></span><span class="line"><span class="cl"><span class="nb">source</span> /usr/local/ascend/ascend_toolkit/profile.sh
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3. 部署MindSpore</span>
</span></span><span class="line"><span class="cl">pip install <span class="nv">mindspore</span><span class="o">==</span>2.3.0
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 4. 配置MPI</span>
</span></span><span class="line"><span class="cl">mpirun -n <span class="m">64</span> --map-by ppr:8:node ./ai4s_app --config config.yaml
</span></span></code></pre></div><h3 id="33-性能调优">3.3 性能调优</h3>
<table>
	<thead>
			<tr>
					<th>调优项</th>
					<th>参数</th>
					<th>效果</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>算子融合阈值</td>
					<td><code>fusion_threshold=0.8</code></td>
					<td>减少内核启动20%</td>
			</tr>
			<tr>
					<td>内存池大小</td>
					<td><code>mem_pool_size=32GB</code></td>
					<td>降低内存碎片</td>
			</tr>
			<tr>
					<td>通信批量</td>
					<td><code>comm_batch_size=64</code></td>
					<td>提升通信效率15%</td>
			</tr>
			<tr>
					<td>流水线深度</td>
					<td><code>pipeline_depth=4</code></td>
					<td>隐藏计算延迟</td>
			</tr>
	</tbody>
</table>
<h2 id="四性能对比">四、性能对比</h2>
<h3 id="41-基准测试">4.1 基准测试</h3>
<table>
	<thead>
			<tr>
					<th>应用</th>
					<th>传统HPC</th>
					<th>鲲鹏AI4S</th>
					<th>提升</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>分子动力学模拟</td>
					<td>100%</td>
					<td>185%</td>
					<td>85%</td>
			</tr>
			<tr>
					<td>基因序列分析</td>
					<td>100%</td>
					<td>210%</td>
					<td>110%</td>
			</tr>
			<tr>
					<td>材料结构预测</td>
					<td>100%</td>
					<td>165%</td>
					<td>65%</td>
			</tr>
	</tbody>
</table>
<h3 id="42-资源利用率">4.2 资源利用率</h3>
<pre tabindex="0"><code>传统HPC:  CPU 65%  NPU 闲置
鲲鹏AI4S: CPU 85%  NPU 92%
</code></pre><h2 id="五运维经验">五、运维经验</h2>
<h3 id="51-监控体系">5.1 监控体系</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># Prometheus 监控配置</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">scrape_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;kunpeng-npu&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;npu-exporter:9090&#39;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">metrics_path</span><span class="p">:</span><span class="w"> </span><span class="l">/metrics</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">job_name</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;ai4s-application&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">static_configs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">targets</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;app-monitor:9091&#39;</span><span class="p">]</span><span class="w">
</span></span></span></code></pre></div><h3 id="52-常见问题">5.2 常见问题</h3>
<table>
	<thead>
			<tr>
					<th>问题</th>
					<th>原因</th>
					<th>解决方案</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>NPU利用率低</td>
					<td>算子未融合</td>
					<td>调整 fusion_threshold</td>
			</tr>
			<tr>
					<td>通信瓶颈</td>
					<td>网络拥塞</td>
					<td>启用RoCE PFC</td>
			</tr>
			<tr>
					<td>内存溢出</td>
					<td>显存分配不当</td>
					<td>使用内存池管理</td>
			</tr>
			<tr>
					<td>任务排队</td>
					<td>调度器配置</td>
					<td>调整优先级策略</td>
			</tr>
	</tbody>
</table>
<h2 id="六总结">六、总结</h2>
<p>鲲鹏软硬协同为AI4S提供了新的计算范式。核心经验：</p>
<ol>
<li><strong>不要简单堆叠硬件</strong>：需要系统级协同设计</li>
<li><strong>算子融合是关键</strong>：减少数据搬运是性能提升的核心</li>
<li><strong>监控要全覆盖</strong>：CPU、NPU、网络、存储都需要监控</li>
<li><strong>调优需要迭代</strong>：没有一蹴而就的最优配置</li>
</ol>
<hr>
<blockquote>
<p><strong>参考来源</strong>：CSDN 资讯，华为鲲鹏官方文档</p>
</blockquote>
]]></content:encoded></item><item><title>自建GPU服务器 vs 云服务：一年成本深度对比</title><link>https://www.chaoyuewang.cn/posts/ops/self-hosted-gpu-vs-cloud-cost-analysis/</link><pubDate>Fri, 29 May 2026 10:15:00 +0800</pubDate><guid>https://www.chaoyuewang.cn/posts/ops/self-hosted-gpu-vs-cloud-cost-analysis/</guid><description>&lt;h2 id="前言"&gt;前言&lt;/h2&gt;
&lt;p&gt;2026年5月，CSDN 报道了一位前大厂工程师&amp;quot;砸4.8万美元在家自建服务器&amp;quot;的案例。一年后，他日均省下105美元。这个数字让我产生了兴趣：自建GPU服务器真的划算吗？&lt;/p&gt;
&lt;p&gt;我用实际数据做了深度对比分析。&lt;/p&gt;
&lt;h2 id="一测试场景"&gt;一、测试场景&lt;/h2&gt;
&lt;p&gt;假设需求：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;日常开发：8小时/天&lt;/li&gt;
&lt;li&gt;AI推理服务：16小时/天&lt;/li&gt;
&lt;li&gt;模型训练：周末集中使用&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id="二硬件配置对比"&gt;二、硬件配置对比&lt;/h2&gt;
&lt;h3 id="21-自建方案"&gt;2.1 自建方案&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;组件&lt;/th&gt;
&lt;th&gt;型号&lt;/th&gt;
&lt;th&gt;价格&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;GPU&lt;/td&gt;
&lt;td&gt;RTX 4090 × 2&lt;/td&gt;
&lt;td&gt;$3,600 × 2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CPU&lt;/td&gt;
&lt;td&gt;AMD Ryzen 9 7950X&lt;/td&gt;
&lt;td&gt;$600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;内存&lt;/td&gt;
&lt;td&gt;128GB DDR5&lt;/td&gt;
&lt;td&gt;$400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;存储&lt;/td&gt;
&lt;td&gt;4TB NVMe SSD&lt;/td&gt;
&lt;td&gt;$300&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;主板+电源+机箱&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;$800&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;合计&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$9,700&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="22-云服务方案"&gt;2.2 云服务方案&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;实例类型&lt;/th&gt;
&lt;th&gt;配置&lt;/th&gt;
&lt;th&gt;月费&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;AWS p4d.24xlarge&lt;/td&gt;
&lt;td&gt;8× A100 40GB&lt;/td&gt;
&lt;td&gt;$32,000/月&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;阿里云 GN7i&lt;/td&gt;
&lt;td&gt;8× A10 24GB&lt;/td&gt;
&lt;td&gt;$15,000/月&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;腾讯云 GN10X&lt;/td&gt;
&lt;td&gt;8× T4&lt;/td&gt;
&lt;td&gt;$8,000/月&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;注意&lt;/strong&gt;：云服务通常按实例规格计费，无法精确匹配个人需求。&lt;/p&gt;
&lt;h2 id="三成本对比一年期"&gt;三、成本对比（一年期）&lt;/h2&gt;
&lt;h3 id="31-自建服务器"&gt;3.1 自建服务器&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;项目&lt;/th&gt;
&lt;th&gt;金额&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;初始硬件投入&lt;/td&gt;
&lt;td&gt;$9,700&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;电费（24h运行）&lt;/td&gt;
&lt;td&gt;$2,400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;网络带宽（100Mbps）&lt;/td&gt;
&lt;td&gt;$600&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;维护成本&lt;/td&gt;
&lt;td&gt;$500&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;一年总成本&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$13,200&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="32-云服务按需"&gt;3.2 云服务（按需）&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;使用场景&lt;/th&gt;
&lt;th&gt;月费&lt;/th&gt;
&lt;th&gt;年费&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;开发环境（1× A100）&lt;/td&gt;
&lt;td&gt;$3,200&lt;/td&gt;
&lt;td&gt;$38,400&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;推理服务（2× A100）&lt;/td&gt;
&lt;td&gt;$6,400&lt;/td&gt;
&lt;td&gt;$76,800&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;训练（周末8小时）&lt;/td&gt;
&lt;td&gt;$2,000&lt;/td&gt;
&lt;td&gt;$24,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;一年总成本&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$139,200&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="33-云服务预留实例"&gt;3.3 云服务（预留实例）&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;类型&lt;/th&gt;
&lt;th&gt;折扣&lt;/th&gt;
&lt;th&gt;年费&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1年预留&lt;/td&gt;
&lt;td&gt;30%&lt;/td&gt;
&lt;td&gt;$97,440&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3年预留&lt;/td&gt;
&lt;td&gt;50%&lt;/td&gt;
&lt;td&gt;$69,600&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="四关键指标对比"&gt;四、关键指标对比&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;维度&lt;/th&gt;
&lt;th&gt;自建&lt;/th&gt;
&lt;th&gt;云服务&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;初始投入&lt;/td&gt;
&lt;td&gt;$9,700&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;一年总成本&lt;/td&gt;
&lt;td&gt;$13,200&lt;/td&gt;
&lt;td&gt;$69,600+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;两年总成本&lt;/td&gt;
&lt;td&gt;$16,700&lt;/td&gt;
&lt;td&gt;$139,200+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;三年总成本&lt;/td&gt;
&lt;td&gt;$20,200&lt;/td&gt;
&lt;td&gt;$208,800+&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;数据安全性&lt;/td&gt;
&lt;td&gt;完全可控&lt;/td&gt;
&lt;td&gt;依赖厂商&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;扩展性&lt;/td&gt;
&lt;td&gt;需手动升级&lt;/td&gt;
&lt;td&gt;弹性伸缩&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;维护责任&lt;/td&gt;
&lt;td&gt;自己负责&lt;/td&gt;
&lt;td&gt;厂商负责&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="五盈亏平衡点"&gt;五、盈亏平衡点&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;自建总成本 = 硬件 + 电费 + 维护
云服务总成本 = 月费 × 12
盈亏平衡点 = 硬件投入 / (云服务月费 - 自建月运营成本)
假设使用 1× A100 实例：
盈亏平衡点 = $9,700 / ($3,200 - $250) ≈ 3.3 个月
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;结论&lt;/strong&gt;：如果使用频率超过3个月，自建服务器就开始省钱。&lt;/p&gt;</description><content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>2026年5月，CSDN 报道了一位前大厂工程师&quot;砸4.8万美元在家自建服务器&quot;的案例。一年后，他日均省下105美元。这个数字让我产生了兴趣：自建GPU服务器真的划算吗？</p>
<p>我用实际数据做了深度对比分析。</p>
<h2 id="一测试场景">一、测试场景</h2>
<p>假设需求：</p>
<ul>
<li>日常开发：8小时/天</li>
<li>AI推理服务：16小时/天</li>
<li>模型训练：周末集中使用</li>
</ul>
<h2 id="二硬件配置对比">二、硬件配置对比</h2>
<h3 id="21-自建方案">2.1 自建方案</h3>
<table>
	<thead>
			<tr>
					<th>组件</th>
					<th>型号</th>
					<th>价格</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>GPU</td>
					<td>RTX 4090 × 2</td>
					<td>$3,600 × 2</td>
			</tr>
			<tr>
					<td>CPU</td>
					<td>AMD Ryzen 9 7950X</td>
					<td>$600</td>
			</tr>
			<tr>
					<td>内存</td>
					<td>128GB DDR5</td>
					<td>$400</td>
			</tr>
			<tr>
					<td>存储</td>
					<td>4TB NVMe SSD</td>
					<td>$300</td>
			</tr>
			<tr>
					<td>主板+电源+机箱</td>
					<td>-</td>
					<td>$800</td>
			</tr>
			<tr>
					<td><strong>合计</strong></td>
					<td>-</td>
					<td><strong>$9,700</strong></td>
			</tr>
	</tbody>
</table>
<h3 id="22-云服务方案">2.2 云服务方案</h3>
<table>
	<thead>
			<tr>
					<th>实例类型</th>
					<th>配置</th>
					<th>月费</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>AWS p4d.24xlarge</td>
					<td>8× A100 40GB</td>
					<td>$32,000/月</td>
			</tr>
			<tr>
					<td>阿里云 GN7i</td>
					<td>8× A10 24GB</td>
					<td>$15,000/月</td>
			</tr>
			<tr>
					<td>腾讯云 GN10X</td>
					<td>8× T4</td>
					<td>$8,000/月</td>
			</tr>
	</tbody>
</table>
<p><strong>注意</strong>：云服务通常按实例规格计费，无法精确匹配个人需求。</p>
<h2 id="三成本对比一年期">三、成本对比（一年期）</h2>
<h3 id="31-自建服务器">3.1 自建服务器</h3>
<table>
	<thead>
			<tr>
					<th>项目</th>
					<th>金额</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>初始硬件投入</td>
					<td>$9,700</td>
			</tr>
			<tr>
					<td>电费（24h运行）</td>
					<td>$2,400</td>
			</tr>
			<tr>
					<td>网络带宽（100Mbps）</td>
					<td>$600</td>
			</tr>
			<tr>
					<td>维护成本</td>
					<td>$500</td>
			</tr>
			<tr>
					<td><strong>一年总成本</strong></td>
					<td><strong>$13,200</strong></td>
			</tr>
	</tbody>
</table>
<h3 id="32-云服务按需">3.2 云服务（按需）</h3>
<table>
	<thead>
			<tr>
					<th>使用场景</th>
					<th>月费</th>
					<th>年费</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>开发环境（1× A100）</td>
					<td>$3,200</td>
					<td>$38,400</td>
			</tr>
			<tr>
					<td>推理服务（2× A100）</td>
					<td>$6,400</td>
					<td>$76,800</td>
			</tr>
			<tr>
					<td>训练（周末8小时）</td>
					<td>$2,000</td>
					<td>$24,000</td>
			</tr>
			<tr>
					<td><strong>一年总成本</strong></td>
					<td>-</td>
					<td><strong>$139,200</strong></td>
			</tr>
	</tbody>
</table>
<h3 id="33-云服务预留实例">3.3 云服务（预留实例）</h3>
<table>
	<thead>
			<tr>
					<th>类型</th>
					<th>折扣</th>
					<th>年费</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>1年预留</td>
					<td>30%</td>
					<td>$97,440</td>
			</tr>
			<tr>
					<td>3年预留</td>
					<td>50%</td>
					<td>$69,600</td>
			</tr>
	</tbody>
</table>
<h2 id="四关键指标对比">四、关键指标对比</h2>
<table>
	<thead>
			<tr>
					<th>维度</th>
					<th>自建</th>
					<th>云服务</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>初始投入</td>
					<td>$9,700</td>
					<td>$0</td>
			</tr>
			<tr>
					<td>一年总成本</td>
					<td>$13,200</td>
					<td>$69,600+</td>
			</tr>
			<tr>
					<td>两年总成本</td>
					<td>$16,700</td>
					<td>$139,200+</td>
			</tr>
			<tr>
					<td>三年总成本</td>
					<td>$20,200</td>
					<td>$208,800+</td>
			</tr>
			<tr>
					<td>数据安全性</td>
					<td>完全可控</td>
					<td>依赖厂商</td>
			</tr>
			<tr>
					<td>扩展性</td>
					<td>需手动升级</td>
					<td>弹性伸缩</td>
			</tr>
			<tr>
					<td>维护责任</td>
					<td>自己负责</td>
					<td>厂商负责</td>
			</tr>
	</tbody>
</table>
<h2 id="五盈亏平衡点">五、盈亏平衡点</h2>
<pre tabindex="0"><code>自建总成本 = 硬件 + 电费 + 维护
云服务总成本 = 月费 × 12

盈亏平衡点 = 硬件投入 / (云服务月费 - 自建月运营成本)

假设使用 1× A100 实例：
盈亏平衡点 = $9,700 / ($3,200 - $250) ≈ 3.3 个月
</code></pre><p><strong>结论</strong>：如果使用频率超过3个月，自建服务器就开始省钱。</p>
<h2 id="六风险与考量">六、风险与考量</h2>
<h3 id="61-自建风险">6.1 自建风险</h3>
<ul>
<li><strong>硬件故障</strong>：需要自己承担维修成本</li>
<li><strong>电力稳定</strong>：需要UPS和备用电源</li>
<li><strong>网络安全</strong>：需要自己配置防火墙、入侵检测</li>
<li><strong>噪音和散热</strong>：家庭环境需要特殊处理</li>
</ul>
<h3 id="62-云服务风险">6.2 云服务风险</h3>
<ul>
<li><strong>厂商锁定</strong>：迁移成本高</li>
<li><strong>价格波动</strong>：云厂商可能涨价</li>
<li><strong>数据合规</strong>：敏感数据需要特别处理</li>
</ul>
<h2 id="七建议">七、建议</h2>
<table>
	<thead>
			<tr>
					<th>用户类型</th>
					<th>推荐方案</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>个人开发者/学习者</td>
					<td>自建 + 云服务混合</td>
			</tr>
			<tr>
					<td>初创公司</td>
					<td>云服务（前期）→ 自建（后期）</td>
			</tr>
			<tr>
					<td>中小企业</td>
					<td>云服务预留实例</td>
			</tr>
			<tr>
					<td>大型企业</td>
					<td>自建数据中心</td>
			</tr>
	</tbody>
</table>
<h2 id="八总结">八、总结</h2>
<p>自建GPU服务器在<strong>长期使用</strong>场景下具有明显的成本优势。但需要权衡维护成本、技术能力和风险承受能力。</p>
<p>对于大多数个人开发者和小型团队，建议采用<strong>混合策略</strong>：</p>
<ul>
<li>日常开发：自建服务器</li>
<li>突发需求：云服务弹性补充</li>
<li>敏感数据：自建环境处理</li>
</ul>
<hr>
<blockquote>
<p><strong>参考来源</strong>：CSDN 资讯，AWS/阿里云/腾讯云定价页面</p>
</blockquote>
]]></content:encoded></item><item><title>Kubernetes 本地开发环境搭建：从0到1的完整指南</title><link>https://www.chaoyuewang.cn/posts/ops/kubernetes-local-dev/</link><pubDate>Thu, 28 May 2026 10:30:00 +0800</pubDate><guid>https://www.chaoyuewang.cn/posts/ops/kubernetes-local-dev/</guid><description>&lt;h2 id="前言"&gt;前言&lt;/h2&gt;
&lt;p&gt;2025年之前，我的本地开发环境一直是 Docker Compose。直到一次生产环境的配置差异导致严重故障，我才意识到本地环境需要更接近生产。&lt;/p&gt;
&lt;p&gt;这篇文章记录完整的 Kubernetes 本地开发环境搭建过程，包括工具选择、配置优化和开发工作流。&lt;/p&gt;
&lt;h2 id="一为什么需要本地-k8s"&gt;一、为什么需要本地 K8s&lt;/h2&gt;
&lt;h3 id="11-痛点分析"&gt;1.1 痛点分析&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;场景&lt;/th&gt;
&lt;th&gt;Docker Compose&lt;/th&gt;
&lt;th&gt;Kubernetes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ConfigMap 测试&lt;/td&gt;
&lt;td&gt;❌ 不支持&lt;/td&gt;
&lt;td&gt;✅ 原生支持&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Service 发现&lt;/td&gt;
&lt;td&gt;⚠️ 手动配置&lt;/td&gt;
&lt;td&gt;✅ 自动发现&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ingress 路由&lt;/td&gt;
&lt;td&gt;❌ 不支持&lt;/td&gt;
&lt;td&gt;✅ 原生支持&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HPA 自动扩缩容&lt;/td&gt;
&lt;td&gt;❌ 不支持&lt;/td&gt;
&lt;td&gt;✅ 原生支持&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;生产一致性&lt;/td&gt;
&lt;td&gt;⚠️ 较低&lt;/td&gt;
&lt;td&gt;✅ 高&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="12-核心价值"&gt;1.2 核心价值&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;&amp;ldquo;本地即生产&amp;rdquo;&lt;/strong&gt;：在本地就能验证生产环境的配置和行为，减少部署时的意外。&lt;/p&gt;
&lt;h2 id="二工具选择"&gt;二、工具选择&lt;/h2&gt;
&lt;h3 id="21-主流方案对比"&gt;2.1 主流方案对比&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;工具&lt;/th&gt;
&lt;th&gt;优点&lt;/th&gt;
&lt;th&gt;缺点&lt;/th&gt;
&lt;th&gt;适用场景&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Minikube&lt;/td&gt;
&lt;td&gt;功能完整、插件丰富&lt;/td&gt;
&lt;td&gt;启动慢、资源占用高&lt;/td&gt;
&lt;td&gt;学习/测试&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kind&lt;/td&gt;
&lt;td&gt;快速启动、Docker后端&lt;/td&gt;
&lt;td&gt;多集群管理弱&lt;/td&gt;
&lt;td&gt;开发/CI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;K3s&lt;/td&gt;
&lt;td&gt;轻量、生产级&lt;/td&gt;
&lt;td&gt;配置稍复杂&lt;/td&gt;
&lt;td&gt;边缘/开发&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Docker Desktop K8s&lt;/td&gt;
&lt;td&gt;一键启用、集成好&lt;/td&gt;
&lt;td&gt;资源占用高、Mac/Win独占&lt;/td&gt;
&lt;td&gt;快速上手&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rancher Desktop&lt;/td&gt;
&lt;td&gt;跨平台、可选容器运行时&lt;/td&gt;
&lt;td&gt;较新、社区较小&lt;/td&gt;
&lt;td&gt;跨平台开发&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="22-我的选择kind"&gt;2.2 我的选择：Kind&lt;/h3&gt;
&lt;p&gt;经过对比测试，我选择 &lt;strong&gt;Kind (Kubernetes in Docker)&lt;/strong&gt; 作为本地开发环境：&lt;/p&gt;</description><content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>2025年之前，我的本地开发环境一直是 Docker Compose。直到一次生产环境的配置差异导致严重故障，我才意识到本地环境需要更接近生产。</p>
<p>这篇文章记录完整的 Kubernetes 本地开发环境搭建过程，包括工具选择、配置优化和开发工作流。</p>
<h2 id="一为什么需要本地-k8s">一、为什么需要本地 K8s</h2>
<h3 id="11-痛点分析">1.1 痛点分析</h3>
<table>
	<thead>
			<tr>
					<th>场景</th>
					<th>Docker Compose</th>
					<th>Kubernetes</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>ConfigMap 测试</td>
					<td>❌ 不支持</td>
					<td>✅ 原生支持</td>
			</tr>
			<tr>
					<td>Service 发现</td>
					<td>⚠️ 手动配置</td>
					<td>✅ 自动发现</td>
			</tr>
			<tr>
					<td>Ingress 路由</td>
					<td>❌ 不支持</td>
					<td>✅ 原生支持</td>
			</tr>
			<tr>
					<td>HPA 自动扩缩容</td>
					<td>❌ 不支持</td>
					<td>✅ 原生支持</td>
			</tr>
			<tr>
					<td>生产一致性</td>
					<td>⚠️ 较低</td>
					<td>✅ 高</td>
			</tr>
	</tbody>
</table>
<h3 id="12-核心价值">1.2 核心价值</h3>
<p><strong>&ldquo;本地即生产&rdquo;</strong>：在本地就能验证生产环境的配置和行为，减少部署时的意外。</p>
<h2 id="二工具选择">二、工具选择</h2>
<h3 id="21-主流方案对比">2.1 主流方案对比</h3>
<table>
	<thead>
			<tr>
					<th>工具</th>
					<th>优点</th>
					<th>缺点</th>
					<th>适用场景</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>Minikube</td>
					<td>功能完整、插件丰富</td>
					<td>启动慢、资源占用高</td>
					<td>学习/测试</td>
			</tr>
			<tr>
					<td>Kind</td>
					<td>快速启动、Docker后端</td>
					<td>多集群管理弱</td>
					<td>开发/CI</td>
			</tr>
			<tr>
					<td>K3s</td>
					<td>轻量、生产级</td>
					<td>配置稍复杂</td>
					<td>边缘/开发</td>
			</tr>
			<tr>
					<td>Docker Desktop K8s</td>
					<td>一键启用、集成好</td>
					<td>资源占用高、Mac/Win独占</td>
					<td>快速上手</td>
			</tr>
			<tr>
					<td>Rancher Desktop</td>
					<td>跨平台、可选容器运行时</td>
					<td>较新、社区较小</td>
					<td>跨平台开发</td>
			</tr>
	</tbody>
</table>
<h3 id="22-我的选择kind">2.2 我的选择：Kind</h3>
<p>经过对比测试，我选择 <strong>Kind (Kubernetes in Docker)</strong> 作为本地开发环境：</p>
<ul>
<li>✅ 启动速度快（~30秒）</li>
<li>✅ 资源占用低（~2GB内存）</li>
<li>✅ 多集群支持（开发/测试环境隔离）</li>
<li>✅ 与 CI/CD 一致（GitHub Actions 也用 Kind）</li>
</ul>
<h2 id="三环境搭建">三、环境搭建</h2>
<h3 id="31-安装工具">3.1 安装工具</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 安装 Docker</span>
</span></span><span class="line"><span class="cl">curl -fsSL https://get.docker.com <span class="p">|</span> sh
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 安装 Kind</span>
</span></span><span class="line"><span class="cl">curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
</span></span><span class="line"><span class="cl">chmod +x ./kind
</span></span><span class="line"><span class="cl">sudo mv ./kind /usr/local/bin/kind
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 安装 kubectl</span>
</span></span><span class="line"><span class="cl">curl -LO <span class="s2">&#34;https://dl.k8s.io/release/</span><span class="k">$(</span>curl -L -s https://dl.k8s.io/release/stable.txt<span class="k">)</span><span class="s2">/bin/linux/amd64/kubectl&#34;</span>
</span></span><span class="line"><span class="cl">chmod +x kubectl
</span></span><span class="line"><span class="cl">sudo mv kubectl /usr/local/bin/
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 安装 Helm</span>
</span></span><span class="line"><span class="cl">curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 <span class="p">|</span> bash
</span></span></code></pre></div><h3 id="32-创建集群">3.2 创建集群</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 创建开发集群</span>
</span></span><span class="line"><span class="cl">kind create cluster --name dev --config kind-config.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 创建测试集群（隔离环境）</span>
</span></span><span class="line"><span class="cl">kind create cluster --name <span class="nb">test</span> --config kind-config.yaml
</span></span></code></pre></div><h3 id="33-集群配置kind-configyaml">3.3 集群配置（kind-config.yaml）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">Cluster</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">kind.x-k8s.io/v1alpha4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">nodes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="l">control-plane</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">kubeadmConfigPatches</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">        kind: InitConfiguration
</span></span></span><span class="line"><span class="cl"><span class="sd">        nodeRegistration:
</span></span></span><span class="line"><span class="cl"><span class="sd">          kubeletExtraArgs:
</span></span></span><span class="line"><span class="cl"><span class="sd">            node-labels: &#34;ingress-ready=true&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">extraPortMappings</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">containerPort</span><span class="p">:</span><span class="w"> </span><span class="m">80</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">hostPort</span><span class="p">:</span><span class="w"> </span><span class="m">80</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">containerPort</span><span class="p">:</span><span class="w"> </span><span class="m">443</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">hostPort</span><span class="p">:</span><span class="w"> </span><span class="m">443</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">protocol</span><span class="p">:</span><span class="w"> </span><span class="l">TCP</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="l">worker</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span>- <span class="nt">role</span><span class="p">:</span><span class="w"> </span><span class="l">worker</span><span class="w">
</span></span></span></code></pre></div><h2 id="四核心组件部署">四、核心组件部署</h2>
<h3 id="41-ingress-controller">4.1 Ingress Controller</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 部署 NGINX Ingress</span>
</span></span><span class="line"><span class="cl">kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/kind/deploy.yaml
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 验证</span>
</span></span><span class="line"><span class="cl">kubectl <span class="nb">wait</span> --namespace ingress-nginx <span class="se">\
</span></span></span><span class="line"><span class="cl">  --for<span class="o">=</span><span class="nv">condition</span><span class="o">=</span>ready pod <span class="se">\
</span></span></span><span class="line"><span class="cl">  --selector<span class="o">=</span>app.kubernetes.io/component<span class="o">=</span>controller <span class="se">\
</span></span></span><span class="line"><span class="cl">  --timeout<span class="o">=</span>90s
</span></span></code></pre></div><h3 id="42-本地-dns">4.2 本地 DNS</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 安装 CoreDNS 优化配置</span>
</span></span><span class="line"><span class="cl">kubectl apply -f https://raw.githubusercontent.com/coredns/coredns/master/coredns.yaml
</span></span></code></pre></div><h3 id="43-存储类">4.3 存储类</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># local-path-storage.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">storage.k8s.io/v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">StorageClass</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">local-path</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">provisioner</span><span class="p">:</span><span class="w"> </span><span class="l">rancher.io/local-path</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">volumeBindingMode</span><span class="p">:</span><span class="w"> </span><span class="l">WaitForFirstConsumer</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">reclaimPolicy</span><span class="p">:</span><span class="w"> </span><span class="l">Delete</span><span class="w">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl apply -f local-path-storage.yaml
</span></span></code></pre></div><h2 id="五开发工作流">五、开发工作流</h2>
<h3 id="51-镜像构建与加载">5.1 镜像构建与加载</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 使用 Kind 内置的 Docker  Registry</span>
</span></span><span class="line"><span class="cl">kind build node-image --image myapp:dev ./
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 或直接加载到集群</span>
</span></span><span class="line"><span class="cl">kind load docker-image myapp:dev --name dev
</span></span></code></pre></div><h3 id="52-热重载开发">5.2 热重载开发</h3>
<p>使用 <strong>Telepresence</strong> 实现本地代码热重载：</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 安装 Telepresence</span>
</span></span><span class="line"><span class="cl">brew install telepresence  <span class="c1"># macOS</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 或</span>
</span></span><span class="line"><span class="cl">curl -fL https://app.gettelepresence.io/download/linux/binary &gt; telepresence <span class="o">&amp;&amp;</span> chmod +x telepresence <span class="o">&amp;&amp;</span> sudo mv telepresence /usr/local/bin/
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 拦截服务流量</span>
</span></span><span class="line"><span class="cl">telepresence intercept myapp --port 3000:3000
</span></span></code></pre></div><h3 id="53-端口转发">5.3 端口转发</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 临时端口转发</span>
</span></span><span class="line"><span class="cl">kubectl port-forward svc/myapp 3000:3000 -n dev
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 或使用 kubectl-aliases 简化</span>
</span></span><span class="line"><span class="cl"><span class="nb">alias</span> <span class="nv">kpf</span><span class="o">=</span><span class="s1">&#39;kubectl port-forward&#39;</span>
</span></span><span class="line"><span class="cl">kpf svc/myapp 3000:3000
</span></span></code></pre></div><h2 id="六配置管理">六、配置管理</h2>
<h3 id="61-configmap-示例">6.1 ConfigMap 示例</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># configmap.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l">v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l">ConfigMap</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">metadata</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">app-config</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">namespace</span><span class="p">:</span><span class="w"> </span><span class="l">dev</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">data</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">NODE_ENV</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;development&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">LOG_LEVEL</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;debug&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">API_ENDPOINT</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;http://api.dev.local&#34;</span><span class="w">
</span></span></span></code></pre></div><div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">kubectl apply -f configmap.yaml
</span></span></code></pre></div><h3 id="62-secret-管理">6.2 Secret 管理</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 创建 Secret</span>
</span></span><span class="line"><span class="cl">kubectl create secret generic db-credentials <span class="se">\
</span></span></span><span class="line"><span class="cl">  --from-literal<span class="o">=</span><span class="nv">username</span><span class="o">=</span>app <span class="se">\
</span></span></span><span class="line"><span class="cl">  --from-literal<span class="o">=</span><span class="nv">password</span><span class="o">=</span>secret <span class="se">\
</span></span></span><span class="line"><span class="cl">  -n dev
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 或使用 Helm Secrets 插件</span>
</span></span><span class="line"><span class="cl">helm secrets install my-release ./charts/myapp <span class="se">\
</span></span></span><span class="line"><span class="cl">  --set db.password<span class="o">=</span><span class="k">$(</span>cat .secrets/db-password<span class="k">)</span>
</span></span></code></pre></div><h2 id="七调试技巧">七、调试技巧</h2>
<h3 id="71-快速查看日志">7.1 快速查看日志</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 查看 Pod 日志</span>
</span></span><span class="line"><span class="cl">kubectl logs -f deployment/myapp -n dev
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 查看上一个实例的日志（重启后）</span>
</span></span><span class="line"><span class="cl">kubectl logs -f deployment/myapp -n dev --previous
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 查看特定容器</span>
</span></span><span class="line"><span class="cl">kubectl logs -f deployment/myapp -c sidecar -n dev
</span></span></code></pre></div><h3 id="72-进入容器调试">7.2 进入容器调试</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 进入容器</span>
</span></span><span class="line"><span class="cl">kubectl <span class="nb">exec</span> -it deployment/myapp -n dev -- /bin/sh
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 或使用 debug 模式启动临时容器</span>
</span></span><span class="line"><span class="cl">kubectl debug -it deployment/myapp -n dev --image<span class="o">=</span>busybox --target<span class="o">=</span>myapp
</span></span></code></pre></div><h3 id="73-资源监控">7.3 资源监控</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 查看资源使用</span>
</span></span><span class="line"><span class="cl">kubectl top pods -n dev
</span></span><span class="line"><span class="cl">kubectl top nodes
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 查看事件</span>
</span></span><span class="line"><span class="cl">kubectl get events -n dev --sort-by<span class="o">=</span><span class="s1">&#39;.lastTimestamp&#39;</span>
</span></span></code></pre></div><h2 id="八cicd-集成">八、CI/CD 集成</h2>
<h3 id="81-github-actions-示例">8.1 GitHub Actions 示例</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># .github/workflows/test.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Test</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">on</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">push]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">test</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l">ubuntu-latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/checkout@v4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Setup Kind</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">helm/kind-action@v1</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">with</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">config</span><span class="p">:</span><span class="w"> </span><span class="l">kind-config.yaml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="p">|</span><span class="sd">
</span></span></span><span class="line"><span class="cl"><span class="sd">          kubectl apply -f k8s/
</span></span></span><span class="line"><span class="cl"><span class="sd">          kubectl wait --for=condition=ready pod -l app=myapp --timeout=120s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Run Tests</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">npm test</span><span class="w">
</span></span></span></code></pre></div><h2 id="九总结">九、总结</h2>
<p>本地 Kubernetes 开发环境的核心价值：</p>
<ol>
<li><strong>一致性</strong>：本地行为接近生产，减少部署意外</li>
<li><strong>快速迭代</strong>：启动快、资源占用低</li>
<li><strong>完整功能</strong>：支持 ConfigMap、Ingress、HPA 等 K8s 原生特性</li>
<li><strong>CI/CD 一致</strong>：本地和 CI 使用相同工具链</li>
</ol>
<p><strong>推荐配置</strong>：</p>
<table>
	<thead>
			<tr>
					<th>场景</th>
					<th>推荐工具</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>快速上手</td>
					<td>Docker Desktop K8s</td>
			</tr>
			<tr>
					<td>日常开发</td>
					<td>Kind</td>
			</tr>
			<tr>
					<td>多集群隔离</td>
					<td>Kind + 多个集群</td>
			</tr>
			<tr>
					<td>生产预演</td>
					<td>K3s</td>
			</tr>
	</tbody>
</table>
<hr>
<blockquote>
<p><strong>更新日志</strong>：本文基于2026年5月实践编写，工具版本可能随时间变化，请以官方文档为准。</p>
</blockquote>
]]></content:encoded></item><item><title>Docker Compose 多环境管理：从开发到生产的优雅方案</title><link>https://www.chaoyuewang.cn/posts/ops/docker-compose-multi-environment/</link><pubDate>Thu, 28 May 2026 10:20:00 +0800</pubDate><guid>https://www.chaoyuewang.cn/posts/ops/docker-compose-multi-environment/</guid><description>&lt;h2 id="前言"&gt;前言&lt;/h2&gt;
&lt;p&gt;2025年，我经历过三次因为环境不一致导致的线上故障。每次排查都花费数小时，最终发现是开发环境和生产环境的配置差异造成的。&lt;/p&gt;
&lt;p&gt;从那时起，我开始系统性地重构多环境管理方案。这篇文章记录完整的实践过程，包括目录结构、配置管理和部署流程。&lt;/p&gt;
&lt;h2 id="一问题根源"&gt;一、问题根源&lt;/h2&gt;
&lt;h3 id="11-常见痛点"&gt;1.1 常见痛点&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;问题&lt;/th&gt;
&lt;th&gt;现象&lt;/th&gt;
&lt;th&gt;影响&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;配置硬编码&lt;/td&gt;
&lt;td&gt;环境变量写死在 docker-compose.yml&lt;/td&gt;
&lt;td&gt;切换环境需修改文件&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;镜像版本混乱&lt;/td&gt;
&lt;td&gt;开发用 latest，生产用具体版本&lt;/td&gt;
&lt;td&gt;行为不一致&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;依赖管理缺失&lt;/td&gt;
&lt;td&gt;数据库迁移脚本未版本化&lt;/td&gt;
&lt;td&gt;数据不一致&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;密钥管理不当&lt;/td&gt;
&lt;td&gt;敏感信息明文存储&lt;/td&gt;
&lt;td&gt;安全风险&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="12-根本原因"&gt;1.2 根本原因&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;环境隔离不彻底&lt;/strong&gt;：开发、测试、生产共用同一份配置模板，仅靠注释区分。&lt;/p&gt;
&lt;h2 id="二目录结构设计"&gt;二、目录结构设计&lt;/h2&gt;
&lt;h3 id="21-推荐结构"&gt;2.1 推荐结构&lt;/h3&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;project/
├── docker-compose.yml # 基础配置（公共部分）
├── docker-compose.override.yml # 本地开发覆盖
├── environments/
│ ├── dev/
│ │ ├── docker-compose.dev.yml
│ │ └── .env.dev
│ ├── staging/
│ │ ├── docker-compose.staging.yml
│ │ └── .env.staging
│ └── prod/
│ ├── docker-compose.prod.yml
│ └── .env.prod
├── scripts/
│ ├── deploy.sh
│ └── rollback.sh
└── .gitignore
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="22-基础配置docker-composeyml"&gt;2.2 基础配置（docker-compose.yml）&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;3.8&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;${APP_IMAGE:-myapp:latest}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;restart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;unless-stopped&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;NODE_ENV=${NODE_ENV:-development}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;LOG_LEVEL=${LOG_LEVEL:-info}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;depends_on&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;db&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;redis&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;postgres:${POSTGRES_VERSION:-16}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;db_data:/var/lib/postgresql/data&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;POSTGRES_DB=${POSTGRES_DB:-app}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;POSTGRES_USER=${POSTGRES_USER:-app}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;POSTGRES_PASSWORD_FILE=/run/secrets/db_password&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;redis:${REDIS_VERSION:-7-alpine}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;command&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;redis-server --maxmemory 256mb&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;db_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="23-生产环境覆盖environmentsproddocker-composeprodyml"&gt;2.3 生产环境覆盖（environments/prod/docker-compose.prod.yml）&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;3.8&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;app&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;myapp:${APP_VERSION:-1.0.0}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;limits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;cpus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;2&amp;#34;&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;2G&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;reservations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;512M&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;healthcheck&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;test&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;CMD&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;curl&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;-f&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;#34;http://localhost:3000/health&amp;#34;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;interval&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;30s&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;timeout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;10s&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;3&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;db_password&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;frontend&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;backend&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;db&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;deploy&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;resources&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;limits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;4G&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;db_data:/var/lib/postgresql/data&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;./backups:/backups&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;secrets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;db_password&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;external&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;networks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;frontend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;bridge&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;internal&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="三环境变量管理"&gt;三、环境变量管理&lt;/h2&gt;
&lt;h3 id="31-env-文件规范"&gt;3.1 .env 文件规范&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# .env.prod&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 应用配置&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;APP_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;NODE_ENV&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;LOG_LEVEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;warn
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 数据库&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;POSTGRES_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="m"&gt;16&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;POSTGRES_DB&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;app_prod
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;POSTGRES_USER&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;app
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# Redis&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;REDIS_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;7-alpine
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 镜像仓库&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nv"&gt;REGISTRY_URL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;registry.example.com
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="32-密钥管理"&gt;3.2 密钥管理&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;不要将密钥存入 .env 文件&lt;/strong&gt;！&lt;/p&gt;</description><content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>2025年，我经历过三次因为环境不一致导致的线上故障。每次排查都花费数小时，最终发现是开发环境和生产环境的配置差异造成的。</p>
<p>从那时起，我开始系统性地重构多环境管理方案。这篇文章记录完整的实践过程，包括目录结构、配置管理和部署流程。</p>
<h2 id="一问题根源">一、问题根源</h2>
<h3 id="11-常见痛点">1.1 常见痛点</h3>
<table>
	<thead>
			<tr>
					<th>问题</th>
					<th>现象</th>
					<th>影响</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>配置硬编码</td>
					<td>环境变量写死在 docker-compose.yml</td>
					<td>切换环境需修改文件</td>
			</tr>
			<tr>
					<td>镜像版本混乱</td>
					<td>开发用 latest，生产用具体版本</td>
					<td>行为不一致</td>
			</tr>
			<tr>
					<td>依赖管理缺失</td>
					<td>数据库迁移脚本未版本化</td>
					<td>数据不一致</td>
			</tr>
			<tr>
					<td>密钥管理不当</td>
					<td>敏感信息明文存储</td>
					<td>安全风险</td>
			</tr>
	</tbody>
</table>
<h3 id="12-根本原因">1.2 根本原因</h3>
<p><strong>环境隔离不彻底</strong>：开发、测试、生产共用同一份配置模板，仅靠注释区分。</p>
<h2 id="二目录结构设计">二、目录结构设计</h2>
<h3 id="21-推荐结构">2.1 推荐结构</h3>
<pre tabindex="0"><code>project/
├── docker-compose.yml          # 基础配置（公共部分）
├── docker-compose.override.yml # 本地开发覆盖
├── environments/
│   ├── dev/
│   │   ├── docker-compose.dev.yml
│   │   └── .env.dev
│   ├── staging/
│   │   ├── docker-compose.staging.yml
│   │   └── .env.staging
│   └── prod/
│       ├── docker-compose.prod.yml
│       └── .env.prod
├── scripts/
│   ├── deploy.sh
│   └── rollback.sh
└── .gitignore
</code></pre><h3 id="22-基础配置docker-composeyml">2.2 基础配置（docker-compose.yml）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;3.8&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">app</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">${APP_IMAGE:-myapp:latest}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">restart</span><span class="p">:</span><span class="w"> </span><span class="l">unless-stopped</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">NODE_ENV=${NODE_ENV:-development}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">LOG_LEVEL=${LOG_LEVEL:-info}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">depends_on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">db</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">redis</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">db</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">postgres:${POSTGRES_VERSION:-16}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">db_data:/var/lib/postgresql/data</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">POSTGRES_DB=${POSTGRES_DB:-app}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">POSTGRES_USER=${POSTGRES_USER:-app}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">POSTGRES_PASSWORD_FILE=/run/secrets/db_password</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">redis</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">redis:${REDIS_VERSION:-7-alpine}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="l">redis-server --maxmemory 256mb</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">db_data</span><span class="p">:</span><span class="w">
</span></span></span></code></pre></div><h3 id="23-生产环境覆盖environmentsproddocker-composeprodyml">2.3 生产环境覆盖（environments/prod/docker-compose.prod.yml）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">version</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;3.8&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">app</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">myapp:${APP_VERSION:-1.0.0}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">cpus</span><span class="p">:</span><span class="w"> </span><span class="s2">&#34;2&#34;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">2G</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">reservations</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">512M</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">healthcheck</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">test</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="s2">&#34;CMD&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;curl&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;-f&#34;</span><span class="p">,</span><span class="w"> </span><span class="s2">&#34;http://localhost:3000/health&#34;</span><span class="p">]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">interval</span><span class="p">:</span><span class="w"> </span><span class="l">30s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">timeout</span><span class="p">:</span><span class="w"> </span><span class="l">10s</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">retries</span><span class="p">:</span><span class="w"> </span><span class="m">3</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">secrets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">db_password</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">frontend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">backend</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">db</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">deploy</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="nt">resources</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">limits</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">memory</span><span class="p">:</span><span class="w"> </span><span class="l">4G</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">db_data:/var/lib/postgresql/data</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">./backups:/backups</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">secrets</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">db_password</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">external</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">networks</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">frontend</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">driver</span><span class="p">:</span><span class="w"> </span><span class="l">bridge</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">backend</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">internal</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span></span></span></code></pre></div><h2 id="三环境变量管理">三、环境变量管理</h2>
<h3 id="31-env-文件规范">3.1 .env 文件规范</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># .env.prod</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 应用配置</span>
</span></span><span class="line"><span class="cl"><span class="nv">APP_VERSION</span><span class="o">=</span>1.0.0
</span></span><span class="line"><span class="cl"><span class="nv">NODE_ENV</span><span class="o">=</span>production
</span></span><span class="line"><span class="cl"><span class="nv">LOG_LEVEL</span><span class="o">=</span>warn
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 数据库</span>
</span></span><span class="line"><span class="cl"><span class="nv">POSTGRES_VERSION</span><span class="o">=</span><span class="m">16</span>
</span></span><span class="line"><span class="cl"><span class="nv">POSTGRES_DB</span><span class="o">=</span>app_prod
</span></span><span class="line"><span class="cl"><span class="nv">POSTGRES_USER</span><span class="o">=</span>app
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># Redis</span>
</span></span><span class="line"><span class="cl"><span class="nv">REDIS_VERSION</span><span class="o">=</span>7-alpine
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 镜像仓库</span>
</span></span><span class="line"><span class="cl"><span class="nv">REGISTRY_URL</span><span class="o">=</span>registry.example.com
</span></span></code></pre></div><h3 id="32-密钥管理">3.2 密钥管理</h3>
<p><strong>不要将密钥存入 .env 文件</strong>！</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 使用 Docker secrets</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;your-secure-password&#34;</span> <span class="p">|</span> docker secret create db_password -
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 或在 Kubernetes 中使用 Secret</span>
</span></span><span class="line"><span class="cl">kubectl create secret generic db-credentials --from-literal<span class="o">=</span><span class="nv">password</span><span class="o">=</span>your-secure-password
</span></span></code></pre></div><h2 id="四部署脚本">四、部署脚本</h2>
<h3 id="41-部署脚本scriptsdeploysh">4.1 部署脚本（scripts/deploy.sh）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="cp">#!/bin/bash
</span></span></span><span class="line"><span class="cl"><span class="nb">set</span> -euo pipefail
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nv">ENV</span><span class="o">=</span><span class="si">${</span><span class="nv">1</span><span class="k">:-</span><span class="nv">dev</span><span class="si">}</span>
</span></span><span class="line"><span class="cl"><span class="nv">PROJECT_DIR</span><span class="o">=</span><span class="s2">&#34;</span><span class="k">$(</span><span class="nb">cd</span> <span class="s2">&#34;</span><span class="k">$(</span>dirname <span class="s2">&#34;</span><span class="si">${</span><span class="nv">BASH_SOURCE</span><span class="p">[0]</span><span class="si">}</span><span class="s2">&#34;</span><span class="k">)</span><span class="s2">/..&#34;</span> <span class="o">&amp;&amp;</span> <span class="nb">pwd</span><span class="k">)</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;🚀 部署到环境: </span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 1. 加载环境变量</span>
</span></span><span class="line"><span class="cl"><span class="nb">set</span> -a
</span></span><span class="line"><span class="cl"><span class="nb">source</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">PROJECT_DIR</span><span class="si">}</span><span class="s2">/environments/</span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">/.env.</span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nb">set</span> +a
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 2. 拉取最新镜像</span>
</span></span><span class="line"><span class="cl">docker compose -f docker-compose.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               -f environments/<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>/docker-compose.<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               pull
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3. 执行数据库迁移</span>
</span></span><span class="line"><span class="cl">docker compose -f docker-compose.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               -f environments/<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>/docker-compose.<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               run --rm app npm run migrate
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 4. 启动服务</span>
</span></span><span class="line"><span class="cl">docker compose -f docker-compose.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               -f environments/<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>/docker-compose.<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               up -d --remove-orphans
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 5. 健康检查</span>
</span></span><span class="line"><span class="cl">sleep <span class="m">10</span>
</span></span><span class="line"><span class="cl">docker compose -f docker-compose.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               -f environments/<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>/docker-compose.<span class="si">${</span><span class="nv">ENV</span><span class="si">}</span>.yml <span class="se">\
</span></span></span><span class="line"><span class="cl">               ps
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;✅ 部署完成&#34;</span>
</span></span></code></pre></div><h3 id="42-回滚脚本scriptsrollbacksh">4.2 回滚脚本（scripts/rollback.sh）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="cp">#!/bin/bash
</span></span></span><span class="line"><span class="cl"><span class="nb">set</span> -euo pipefail
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nv">ENV</span><span class="o">=</span><span class="si">${</span><span class="nv">1</span><span class="k">:-</span><span class="nv">dev</span><span class="si">}</span>
</span></span><span class="line"><span class="cl"><span class="nv">PROJECT_DIR</span><span class="o">=</span><span class="s2">&#34;</span><span class="k">$(</span><span class="nb">cd</span> <span class="s2">&#34;</span><span class="k">$(</span>dirname <span class="s2">&#34;</span><span class="si">${</span><span class="nv">BASH_SOURCE</span><span class="p">[0]</span><span class="si">}</span><span class="s2">&#34;</span><span class="k">)</span><span class="s2">/..&#34;</span> <span class="o">&amp;&amp;</span> <span class="nb">pwd</span><span class="k">)</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;⏪ 回滚环境: </span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 获取上一个版本</span>
</span></span><span class="line"><span class="cl"><span class="nv">PREV_VERSION</span><span class="o">=</span><span class="k">$(</span>docker images --format <span class="s2">&#34;{{.Tag}}&#34;</span> myapp <span class="p">|</span> head -2 <span class="p">|</span> tail -1<span class="k">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 更新环境变量</span>
</span></span><span class="line"><span class="cl">sed -i <span class="s2">&#34;s/APP_VERSION=.*/APP_VERSION=</span><span class="si">${</span><span class="nv">PREV_VERSION</span><span class="si">}</span><span class="s2">/&#34;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl">    <span class="s2">&#34;</span><span class="si">${</span><span class="nv">PROJECT_DIR</span><span class="si">}</span><span class="s2">/environments/</span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">/.env.</span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 重新部署</span>
</span></span><span class="line"><span class="cl"><span class="s2">&#34;</span><span class="si">${</span><span class="nv">PROJECT_DIR</span><span class="si">}</span><span class="s2">/scripts/deploy.sh&#34;</span> <span class="s2">&#34;</span><span class="si">${</span><span class="nv">ENV</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> <span class="s2">&#34;✅ 回滚完成至版本: </span><span class="si">${</span><span class="nv">PREV_VERSION</span><span class="si">}</span><span class="s2">&#34;</span>
</span></span></code></pre></div><h2 id="五cicd-集成">五、CI/CD 集成</h2>
<h3 id="51-github-actions-示例">5.1 GitHub Actions 示例</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="c"># .github/workflows/deploy.yml</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Deploy</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">on</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">push</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">branches</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="l">main]</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="nt">jobs</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">deploy-staging</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l">ubuntu-latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/checkout@v4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Deploy to Staging</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">./scripts/deploy.sh staging</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">env</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">DOCKER_REGISTRY_TOKEN</span><span class="p">:</span><span class="w"> </span><span class="l">${{ secrets.DOCKER_TOKEN }}</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">deploy-prod</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">needs</span><span class="p">:</span><span class="w"> </span><span class="l">deploy-staging</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">runs-on</span><span class="p">:</span><span class="w"> </span><span class="l">ubuntu-latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">if</span><span class="p">:</span><span class="w"> </span><span class="l">github.ref == &#39;refs/heads/main&#39;</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">steps</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">uses</span><span class="p">:</span><span class="w"> </span><span class="l">actions/checkout@v4</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l">Deploy to Production</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l">./scripts/deploy.sh prod</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">        </span><span class="nt">env</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">          </span><span class="nt">DOCKER_REGISTRY_TOKEN</span><span class="p">:</span><span class="w"> </span><span class="l">${{ secrets.DOCKER_TOKEN }}</span><span class="w">
</span></span></span></code></pre></div><h2 id="六最佳实践总结">六、最佳实践总结</h2>
<table>
	<thead>
			<tr>
					<th>实践</th>
					<th>说明</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>✅ 基础配置与覆盖分离</td>
					<td>docker-compose.yml 放公共配置，环境文件放差异</td>
			</tr>
			<tr>
					<td>✅ 使用 .env 文件</td>
					<td>不要硬编码环境变量</td>
			</tr>
			<tr>
					<td>✅ 密钥使用 secrets</td>
					<td>不要将密钥存入版本控制</td>
			</tr>
			<tr>
					<td>✅ 固定镜像版本</td>
					<td>避免 latest 标签导致的不一致</td>
			</tr>
			<tr>
					<td>✅ 健康检查</td>
					<td>确保服务真正可用后再认为部署成功</td>
			</tr>
			<tr>
					<td>✅ 回滚方案</td>
					<td>每次部署前确认可以快速回滚</td>
			</tr>
			<tr>
					<td>❌ 不要手动修改线上配置</td>
					<td>所有变更通过代码审查</td>
			</tr>
			<tr>
					<td>❌ 不要共享 .env 文件</td>
					<td>每个环境独立文件</td>
			</tr>
	</tbody>
</table>
<h2 id="七总结">七、总结</h2>
<p>多环境管理的核心原则：</p>
<ol>
<li><strong>配置即代码</strong>：所有环境配置版本化</li>
<li><strong>最小差异</strong>：基础配置最大化，环境差异最小化</li>
<li><strong>自动化部署</strong>：减少人为操作，提高一致性</li>
<li><strong>可回滚</strong>：每次部署都有明确的回滚路径</li>
</ol>
<p>如果你也在为环境不一致头疼，我的建议是：<strong>尽早建立规范</strong>，不要等到问题频发时才重构。</p>
<hr>
<blockquote>
<p><strong>更新日志</strong>：本文基于2026年5月实践编写，具体命令和配置可能因项目而异，请以实际需求为准。</p>
</blockquote>
]]></content:encoded></item><item><title>SSH密钥持久化：为什么容器内生成的密钥在重启后丢失</title><link>https://www.chaoyuewang.cn/posts/ops/ssh-key-persistence/</link><pubDate>Wed, 27 May 2026 13:00:00 +0800</pubDate><guid>https://www.chaoyuewang.cn/posts/ops/ssh-key-persistence/</guid><description>&lt;h2 id="前言"&gt;前言&lt;/h2&gt;
&lt;p&gt;2026年5月，我遇到一个反复出现的问题：容器内生成的SSH密钥在容器重启后丢失，导致无法通过SSH连接到宿主机。&lt;/p&gt;
&lt;p&gt;这个问题看似简单，但背后涉及Docker容器的文件系统隔离机制。这篇文章记录完整的排查过程和最终解决方案。&lt;/p&gt;
&lt;h2 id="一问题现象"&gt;一、问题现象&lt;/h2&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;现象：SSH密钥在容器内生成，容器重启后密钥消失，无法连接宿主机
时间：2026-05-18
环境：fnOS虚拟化平台 + Ubuntu 24.04 VM + Docker
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;初始错误&lt;/strong&gt;：&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;Warning: Permanently added &amp;#39;192.168.0.200&amp;#39; (ED25519) to the list of known hosts.
Permission denied (publickey).
&lt;/code&gt;&lt;/pre&gt;&lt;h2 id="二根因分析"&gt;二、根因分析&lt;/h2&gt;
&lt;h3 id="21-docker容器的文件系统隔离"&gt;2.1 Docker容器的文件系统隔离&lt;/h3&gt;
&lt;p&gt;Docker容器使用&lt;strong&gt;联合文件系统（UnionFS）&lt;/strong&gt;，容器内的文件系统是独立的。当容器重启时：&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;容器内生成的文件&lt;/strong&gt; → 存储在容器的可写层&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;容器重启&lt;/strong&gt; → 可写层被销毁，所有未持久化的文件丢失&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;SSH密钥丢失&lt;/strong&gt; → 无法通过密钥认证连接宿主机&lt;/li&gt;
&lt;/ol&gt;
&lt;h3 id="22-为什么宿主机ssh拒绝使用容器内密钥"&gt;2.2 为什么宿主机SSH拒绝使用容器内密钥&lt;/h3&gt;
&lt;p&gt;即使密钥被挂载到容器，宿主机SSH服务也会拒绝使用：&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;# /var/log/auth.log
sshd[12345]: Authentication refused: bad ownership or modes for key file
&lt;/code&gt;&lt;/pre&gt;&lt;p&gt;&lt;strong&gt;原因&lt;/strong&gt;：SSH要求私钥文件所有者必须是 &lt;code&gt;root:root&lt;/code&gt;，且权限为 &lt;code&gt;600&lt;/code&gt;。容器内生成的密钥，挂载后文件所有者可能不匹配。&lt;/p&gt;
&lt;h2 id="三解决方案"&gt;三、解决方案&lt;/h2&gt;
&lt;h3 id="31-核心原则"&gt;3.1 核心原则&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;密钥必须在宿主机生成，不能容器内生成。&lt;/strong&gt;&lt;/p&gt;
&lt;h3 id="32-完整步骤"&gt;3.2 完整步骤&lt;/h3&gt;
&lt;h4 id="步骤1在宿主机生成ssh密钥"&gt;步骤1：在宿主机生成SSH密钥&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 宿主机执行（192.168.0.200）&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ssh-keygen -t ed25519 -C &lt;span class="s2"&gt;&amp;#34;hermes-agent&amp;#34;&lt;/span&gt; -f /home/ksboy/.ssh/hermes_key
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 设置权限&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;chmod &lt;span class="m"&gt;600&lt;/span&gt; /home/ksboy/.ssh/hermes_key
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;chmod &lt;span class="m"&gt;644&lt;/span&gt; /home/ksboy/.ssh/hermes_key.pub
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="步骤2将公钥添加到宿主机授权文件"&gt;步骤2：将公钥添加到宿主机授权文件&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 宿主机执行&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;cat /home/ksboy/.ssh/hermes_key.pub &amp;gt;&amp;gt; /home/ksboy/.ssh/authorized_keys
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;chmod &lt;span class="m"&gt;600&lt;/span&gt; /home/ksboy/.ssh/authorized_keys
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="步骤3在docker-composeyml中挂载密钥"&gt;步骤3：在docker-compose.yml中挂载密钥&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-yaml" data-lang="yaml"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="nt"&gt;services&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;hermes-agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;image&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l"&gt;hermes-agent:latest&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;volumes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# 密钥挂载（只读模式）&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;/home/ksboy/.ssh/hermes_key:/root/.ssh/id_ed25519:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;/home/ksboy/.ssh/hermes_key.pub:/root/.ssh/id_ed25519.pub:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="c"&gt;# SSH配置&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;/home/ksboy/.ssh/config:/root/.ssh/config:ro&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;environment&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;SSH_HOST=192.168.0.200&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="w"&gt; &lt;/span&gt;- &lt;span class="l"&gt;SSH_USER=ksboy&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h4 id="步骤4宿主机ssh配置调整"&gt;步骤4：宿主机SSH配置调整&lt;/h4&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# /etc/ssh/sshd_config&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 允许Docker网段访问&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ListenAddress 0.0.0.0
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 重启SSH服务&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;systemctl restart sshd
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="33-验证"&gt;3.3 验证&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 容器内测试&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ssh -i /root/.ssh/id_ed25519 ksboy@192.168.0.200 &lt;span class="s2"&gt;&amp;#34;echo &amp;#39;连接成功&amp;#39;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 重启容器后再次测试&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker-compose restart hermes-agent
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ssh -i /root/.ssh/id_ed25519 ksboy@192.168.0.200 &lt;span class="s2"&gt;&amp;#34;echo &amp;#39;重启后连接成功&amp;#39;&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="四关键要点"&gt;四、关键要点&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;要点&lt;/th&gt;
&lt;th&gt;说明&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;密钥生成位置&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;必须在宿主机&lt;/strong&gt;，容器内生成的密钥重启后丢失&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;文件所有者&lt;/td&gt;
&lt;td&gt;宿主机密钥必须为 &lt;code&gt;root:root&lt;/code&gt;（容器以root运行）&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;挂载模式&lt;/td&gt;
&lt;td&gt;使用 &lt;code&gt;:ro&lt;/code&gt; 只读模式，防止容器内意外修改&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SSH监听地址&lt;/td&gt;
&lt;td&gt;宿主机需监听 &lt;code&gt;0.0.0.0&lt;/code&gt;，允许Docker网段访问&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;网络隔离&lt;/td&gt;
&lt;td&gt;容器在Docker网段，宿主机在LAN网段，需正确配置路由&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id="五常见错误"&gt;五、常见错误&lt;/h2&gt;
&lt;h3 id="错误1容器内生成密钥"&gt;错误1：容器内生成密钥&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ❌ 错误做法&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;docker &lt;span class="nb"&gt;exec&lt;/span&gt; -it hermes-agent ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# 容器重启后密钥丢失&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="错误2密钥权限不正确"&gt;错误2：密钥权限不正确&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ❌ 错误做法&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;chmod &lt;span class="m"&gt;644&lt;/span&gt; /home/ksboy/.ssh/hermes_key &lt;span class="c1"&gt;# SSH拒绝使用&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ✅ 正确做法&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;chmod &lt;span class="m"&gt;600&lt;/span&gt; /home/ksboy/.ssh/hermes_key
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id="错误3宿主机ssh只监听localhost"&gt;错误3：宿主机SSH只监听localhost&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre tabindex="0" class="chroma"&gt;&lt;code class="language-bash" data-lang="bash"&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ❌ 错误做法&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ListenAddress 127.0.0.1 &lt;span class="c1"&gt;# Docker容器无法连接&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;&lt;span class="c1"&gt;# ✅ 正确做法&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class="line"&gt;&lt;span class="cl"&gt;ListenAddress 0.0.0.0
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id="六总结"&gt;六、总结&lt;/h2&gt;
&lt;p&gt;SSH密钥持久化的核心是理解Docker的文件系统隔离机制：&lt;/p&gt;</description><content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>2026年5月，我遇到一个反复出现的问题：容器内生成的SSH密钥在容器重启后丢失，导致无法通过SSH连接到宿主机。</p>
<p>这个问题看似简单，但背后涉及Docker容器的文件系统隔离机制。这篇文章记录完整的排查过程和最终解决方案。</p>
<h2 id="一问题现象">一、问题现象</h2>
<pre tabindex="0"><code>现象：SSH密钥在容器内生成，容器重启后密钥消失，无法连接宿主机
时间：2026-05-18
环境：fnOS虚拟化平台 + Ubuntu 24.04 VM + Docker
</code></pre><p><strong>初始错误</strong>：</p>
<pre tabindex="0"><code>Warning: Permanently added &#39;192.168.0.200&#39; (ED25519) to the list of known hosts.
Permission denied (publickey).
</code></pre><h2 id="二根因分析">二、根因分析</h2>
<h3 id="21-docker容器的文件系统隔离">2.1 Docker容器的文件系统隔离</h3>
<p>Docker容器使用<strong>联合文件系统（UnionFS）</strong>，容器内的文件系统是独立的。当容器重启时：</p>
<ol>
<li><strong>容器内生成的文件</strong> → 存储在容器的可写层</li>
<li><strong>容器重启</strong> → 可写层被销毁，所有未持久化的文件丢失</li>
<li><strong>SSH密钥丢失</strong> → 无法通过密钥认证连接宿主机</li>
</ol>
<h3 id="22-为什么宿主机ssh拒绝使用容器内密钥">2.2 为什么宿主机SSH拒绝使用容器内密钥</h3>
<p>即使密钥被挂载到容器，宿主机SSH服务也会拒绝使用：</p>
<pre tabindex="0"><code># /var/log/auth.log
sshd[12345]: Authentication refused: bad ownership or modes for key file
</code></pre><p><strong>原因</strong>：SSH要求私钥文件所有者必须是 <code>root:root</code>，且权限为 <code>600</code>。容器内生成的密钥，挂载后文件所有者可能不匹配。</p>
<h2 id="三解决方案">三、解决方案</h2>
<h3 id="31-核心原则">3.1 核心原则</h3>
<p><strong>密钥必须在宿主机生成，不能容器内生成。</strong></p>
<h3 id="32-完整步骤">3.2 完整步骤</h3>
<h4 id="步骤1在宿主机生成ssh密钥">步骤1：在宿主机生成SSH密钥</h4>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 宿主机执行（192.168.0.200）</span>
</span></span><span class="line"><span class="cl">ssh-keygen -t ed25519 -C <span class="s2">&#34;hermes-agent&#34;</span> -f /home/ksboy/.ssh/hermes_key
</span></span><span class="line"><span class="cl"><span class="c1"># 设置权限</span>
</span></span><span class="line"><span class="cl">chmod <span class="m">600</span> /home/ksboy/.ssh/hermes_key
</span></span><span class="line"><span class="cl">chmod <span class="m">644</span> /home/ksboy/.ssh/hermes_key.pub
</span></span></code></pre></div><h4 id="步骤2将公钥添加到宿主机授权文件">步骤2：将公钥添加到宿主机授权文件</h4>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 宿主机执行</span>
</span></span><span class="line"><span class="cl">cat /home/ksboy/.ssh/hermes_key.pub &gt;&gt; /home/ksboy/.ssh/authorized_keys
</span></span><span class="line"><span class="cl">chmod <span class="m">600</span> /home/ksboy/.ssh/authorized_keys
</span></span></code></pre></div><h4 id="步骤3在docker-composeyml中挂载密钥">步骤3：在docker-compose.yml中挂载密钥</h4>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-yaml" data-lang="yaml"><span class="line"><span class="cl"><span class="nt">services</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">  </span><span class="nt">hermes-agent</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l">hermes-agent:latest</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">volumes</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="c"># 密钥挂载（只读模式）</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">/home/ksboy/.ssh/hermes_key:/root/.ssh/id_ed25519:ro</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">/home/ksboy/.ssh/hermes_key.pub:/root/.ssh/id_ed25519.pub:ro</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span><span class="c"># SSH配置</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">/home/ksboy/.ssh/config:/root/.ssh/config:ro</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">    </span><span class="nt">environment</span><span class="p">:</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">SSH_HOST=192.168.0.200</span><span class="w">
</span></span></span><span class="line"><span class="cl"><span class="w">      </span>- <span class="l">SSH_USER=ksboy</span><span class="w">
</span></span></span></code></pre></div><h4 id="步骤4宿主机ssh配置调整">步骤4：宿主机SSH配置调整</h4>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># /etc/ssh/sshd_config</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 允许Docker网段访问</span>
</span></span><span class="line"><span class="cl">ListenAddress 0.0.0.0
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 重启SSH服务</span>
</span></span><span class="line"><span class="cl">systemctl restart sshd
</span></span></code></pre></div><h3 id="33-验证">3.3 验证</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 容器内测试</span>
</span></span><span class="line"><span class="cl">ssh -i /root/.ssh/id_ed25519 ksboy@192.168.0.200 <span class="s2">&#34;echo &#39;连接成功&#39;&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 重启容器后再次测试</span>
</span></span><span class="line"><span class="cl">docker-compose restart hermes-agent
</span></span><span class="line"><span class="cl">ssh -i /root/.ssh/id_ed25519 ksboy@192.168.0.200 <span class="s2">&#34;echo &#39;重启后连接成功&#39;&#34;</span>
</span></span></code></pre></div><h2 id="四关键要点">四、关键要点</h2>
<table>
	<thead>
			<tr>
					<th>要点</th>
					<th>说明</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>密钥生成位置</td>
					<td><strong>必须在宿主机</strong>，容器内生成的密钥重启后丢失</td>
			</tr>
			<tr>
					<td>文件所有者</td>
					<td>宿主机密钥必须为 <code>root:root</code>（容器以root运行）</td>
			</tr>
			<tr>
					<td>挂载模式</td>
					<td>使用 <code>:ro</code> 只读模式，防止容器内意外修改</td>
			</tr>
			<tr>
					<td>SSH监听地址</td>
					<td>宿主机需监听 <code>0.0.0.0</code>，允许Docker网段访问</td>
			</tr>
			<tr>
					<td>网络隔离</td>
					<td>容器在Docker网段，宿主机在LAN网段，需正确配置路由</td>
			</tr>
	</tbody>
</table>
<h2 id="五常见错误">五、常见错误</h2>
<h3 id="错误1容器内生成密钥">错误1：容器内生成密钥</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># ❌ 错误做法</span>
</span></span><span class="line"><span class="cl">docker <span class="nb">exec</span> -it hermes-agent ssh-keygen -t ed25519 -f /root/.ssh/id_ed25519
</span></span><span class="line"><span class="cl"><span class="c1"># 容器重启后密钥丢失</span>
</span></span></code></pre></div><h3 id="错误2密钥权限不正确">错误2：密钥权限不正确</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># ❌ 错误做法</span>
</span></span><span class="line"><span class="cl">chmod <span class="m">644</span> /home/ksboy/.ssh/hermes_key  <span class="c1"># SSH拒绝使用</span>
</span></span><span class="line"><span class="cl"><span class="c1"># ✅ 正确做法</span>
</span></span><span class="line"><span class="cl">chmod <span class="m">600</span> /home/ksboy/.ssh/hermes_key
</span></span></code></pre></div><h3 id="错误3宿主机ssh只监听localhost">错误3：宿主机SSH只监听localhost</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># ❌ 错误做法</span>
</span></span><span class="line"><span class="cl">ListenAddress 127.0.0.1  <span class="c1"># Docker容器无法连接</span>
</span></span><span class="line"><span class="cl"><span class="c1"># ✅ 正确做法</span>
</span></span><span class="line"><span class="cl">ListenAddress 0.0.0.0
</span></span></code></pre></div><h2 id="六总结">六、总结</h2>
<p>SSH密钥持久化的核心是理解Docker的文件系统隔离机制：</p>
<ol>
<li><strong>容器内文件不是持久的</strong> → 密钥必须在宿主机生成</li>
<li><strong>权限必须匹配</strong> → 宿主机密钥所有者需与容器运行用户一致</li>
<li><strong>网络必须可达</strong> → 宿主机SSH需监听所有地址</li>
</ol>
<p>这个解决方案已经稳定运行超过2周，容器重启后SSH连接正常。</p>
<hr>
<blockquote>
<p><strong>相关文档</strong>：<a href="~/.hermes/skills/git-credential-persistence/SKILL.md">SSH密钥持久化技能</a></p>
</blockquote>
]]></content:encoded></item><item><title>银狐病毒 (SilverFox) 深度分析：Go语言木马的感染链与检测实战</title><link>https://www.chaoyuewang.cn/posts/security/silverfox-deep-analysis-2026/</link><pubDate>Mon, 25 May 2026 00:00:00 +0000</pubDate><guid>https://www.chaoyuewang.cn/posts/security/silverfox-deep-analysis-2026/</guid><description>&lt;h2 id="前言"&gt;前言&lt;/h2&gt;
&lt;p&gt;银狐病毒（SilverFox）是2022年9月由腾讯安全、360、微步在线三家厂商几乎同时独立发现的针对中国企业的恶意软件家族。与传统的C/C++木马不同，银狐使用 &lt;strong&gt;Go语言编写&lt;/strong&gt;，这带来了独特的检测挑战和特征。&lt;/p&gt;
&lt;p&gt;银狐的目标明确：中国企业的财务部门。攻击手法成熟：钓鱼邮件、即时通讯、假冒软件更新。持久化手段多样：注册表、WMI、计划任务、AppInit_DLLs。防御规避专业：篡改Windows Defender排除项、进程注入、随机进程名。&lt;/p&gt;
&lt;p&gt;本文基于开源检测工具源代码分析，提供：&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;银狐的完整感染链分析&lt;/li&gt;
&lt;li&gt;Go语言木马的技术特征&lt;/li&gt;
&lt;li&gt;增强版YARA规则（覆盖行为特征）&lt;/li&gt;
&lt;li&gt;可直接使用的检测脚本&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;声明&lt;/strong&gt;: 本文IOC来自开源检测工具源代码，最新IOC请从官方查杀工具获取。&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2 id="一银狐病毒技术特征"&gt;一、银狐病毒技术特征&lt;/h2&gt;
&lt;h3 id="11-go语言木马的特征"&gt;1.1 Go语言木马的特征&lt;/h3&gt;
&lt;p&gt;银狐使用Go语言编写，具有以下可检测特征：&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;特征类型&lt;/th&gt;
&lt;th&gt;检测方法&lt;/th&gt;
&lt;th&gt;说明&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Go运行时库&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;内存扫描/字符串分析&lt;/td&gt;
&lt;td&gt;Go程序加载&lt;code&gt;runtime.dll&lt;/code&gt;、&lt;code&gt;go.dll&lt;/code&gt;等运行时库&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Go二进制结构&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PE头分析&lt;/td&gt;
&lt;td&gt;Go编译的二进制文件有特定的PE节区（如&lt;code&gt;.go.buildinfo&lt;/code&gt;）&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Go异常处理&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;行为分析&lt;/td&gt;
&lt;td&gt;Go的panic/recover机制与C++异常处理不同&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Go协程特征&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;线程行为&lt;/td&gt;
&lt;td&gt;Go的Goroutine调度器会产生特定的线程创建模式&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id="12-银狐的行为特征"&gt;1.2 银狐的行为特征&lt;/h3&gt;
&lt;p&gt;根据开源检测工具分析，银狐具有以下行为：&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;1. 进程注入：注入 svchost.exe 等系统进程
2. 注册表持久化：HKCU/HKLM Run键 + AppInit_DLLs
3. WMI事件订阅：__EventFilter + __EventConsumer + __FilterToConsumerBinding
4. 计划任务：创建 Task1 或 SilverFox 相关任务
5. Windows Defender排除：篡改排除路径以规避检测
6. 文件伪装：使用 svchost64.exe、随机进程名（pXDc9LSz.exe）
&lt;/code&gt;&lt;/pre&gt;&lt;hr&gt;
&lt;h2 id="二感染链分析"&gt;二、感染链分析&lt;/h2&gt;
&lt;p&gt;银狐的完整攻击链如下：&lt;/p&gt;
&lt;pre tabindex="0"&gt;&lt;code&gt;┌─────────────────────────────────────────────────────────────────────┐
│ 银狐感染链 │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ 阶段1: 初始访问 │
│ ├── 钓鱼邮件（伪装成发票、合同） │
│ ├── 即时通讯（微信/钉钉发送恶意文件） │
│ └── 假冒软件更新（财务软件、OA系统） │
│ │
│ 阶段2: 执行 │
│ ├── 用户双击恶意附件 │
│ ├── 恶意宏代码执行 │
│ └── 社会工程学诱导（&amp;#34;文件恢复指南&amp;#34;等） │
│ │
│ 阶段3: 持久化 │
│ ├── 注册表 Run 键写入 │
│ ├── WMI 事件订阅（__EventFilter） │
│ ├── 计划任务创建 │
│ └── AppInit_DLLs 注入 │
│ │
│ 阶段4: 防御规避 │
│ ├── Windows Defender 排除项篡改 │
│ ├── 进程注入（svchost.exe） │
│ ├── 随机进程名生成 │
│ └── 文件伪装（svchost64.exe） │
│ │
│ 阶段5: C2通信 │
│ ├── HTTP/HTTPS 心跳包 │
│ ├── DNS 查询（可能使用DGA） │
│ └── 加密通信（TLS/自定义协议） │
│ │
│ 阶段6: 数据窃取 │
│ ├── 浏览器凭证窃取 │
│ ├── 财务软件凭证窃取 │
│ └── 即时通讯凭证窃取 │
│ │
└─────────────────────────────────────────────────────────────────────┘
&lt;/code&gt;&lt;/pre&gt;&lt;h3 id="21-各阶段检测要点"&gt;2.1 各阶段检测要点&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;阶段&lt;/th&gt;
&lt;th&gt;检测重点&lt;/th&gt;
&lt;th&gt;检测工具&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;初始访问&lt;/td&gt;
&lt;td&gt;邮件附件、钓鱼链接&lt;/td&gt;
&lt;td&gt;邮件网关、URL过滤&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;执行&lt;/td&gt;
&lt;td&gt;可疑进程启动&lt;/td&gt;
&lt;td&gt;EDR、进程监控&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;持久化&lt;/td&gt;
&lt;td&gt;注册表、WMI、计划任务&lt;/td&gt;
&lt;td&gt;注册表监控、WMI监控&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;防御规避&lt;/td&gt;
&lt;td&gt;Defender排除项、进程注入&lt;/td&gt;
&lt;td&gt;安全配置审计、内存扫描&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;C2通信&lt;/td&gt;
&lt;td&gt;异常网络连接、DNS查询&lt;/td&gt;
&lt;td&gt;网络流量分析、DNS监控&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;数据窃取&lt;/td&gt;
&lt;td&gt;凭证访问、文件外传&lt;/td&gt;
&lt;td&gt;DLP、凭证监控&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2 id="三ioc-列表来自开源工具"&gt;三、IOC 列表（来自开源工具）&lt;/h2&gt;
&lt;p&gt;以下IOC来自 &lt;a href="https://github.com/zseagate/SilverFox-Scanner"&gt;zseagate/SilverFox-Scanner&lt;/a&gt; 和 &lt;a href="https://github.com/das-secbox/silverfox_scanner"&gt;das-secbox/silverfox_scanner&lt;/a&gt; 的源代码。&lt;/p&gt;</description><content:encoded><![CDATA[<h2 id="前言">前言</h2>
<p>银狐病毒（SilverFox）是2022年9月由腾讯安全、360、微步在线三家厂商几乎同时独立发现的针对中国企业的恶意软件家族。与传统的C/C++木马不同，银狐使用 <strong>Go语言编写</strong>，这带来了独特的检测挑战和特征。</p>
<p>银狐的目标明确：中国企业的财务部门。攻击手法成熟：钓鱼邮件、即时通讯、假冒软件更新。持久化手段多样：注册表、WMI、计划任务、AppInit_DLLs。防御规避专业：篡改Windows Defender排除项、进程注入、随机进程名。</p>
<p>本文基于开源检测工具源代码分析，提供：</p>
<ul>
<li>银狐的完整感染链分析</li>
<li>Go语言木马的技术特征</li>
<li>增强版YARA规则（覆盖行为特征）</li>
<li>可直接使用的检测脚本</li>
</ul>
<blockquote>
<p><strong>声明</strong>: 本文IOC来自开源检测工具源代码，最新IOC请从官方查杀工具获取。</p>
</blockquote>
<hr>
<h2 id="一银狐病毒技术特征">一、银狐病毒技术特征</h2>
<h3 id="11-go语言木马的特征">1.1 Go语言木马的特征</h3>
<p>银狐使用Go语言编写，具有以下可检测特征：</p>
<table>
	<thead>
			<tr>
					<th>特征类型</th>
					<th>检测方法</th>
					<th>说明</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td><strong>Go运行时库</strong></td>
					<td>内存扫描/字符串分析</td>
					<td>Go程序加载<code>runtime.dll</code>、<code>go.dll</code>等运行时库</td>
			</tr>
			<tr>
					<td><strong>Go二进制结构</strong></td>
					<td>PE头分析</td>
					<td>Go编译的二进制文件有特定的PE节区（如<code>.go.buildinfo</code>）</td>
			</tr>
			<tr>
					<td><strong>Go异常处理</strong></td>
					<td>行为分析</td>
					<td>Go的panic/recover机制与C++异常处理不同</td>
			</tr>
			<tr>
					<td><strong>Go协程特征</strong></td>
					<td>线程行为</td>
					<td>Go的Goroutine调度器会产生特定的线程创建模式</td>
			</tr>
	</tbody>
</table>
<h3 id="12-银狐的行为特征">1.2 银狐的行为特征</h3>
<p>根据开源检测工具分析，银狐具有以下行为：</p>
<pre tabindex="0"><code>1. 进程注入：注入 svchost.exe 等系统进程
2. 注册表持久化：HKCU/HKLM Run键 + AppInit_DLLs
3. WMI事件订阅：__EventFilter + __EventConsumer + __FilterToConsumerBinding
4. 计划任务：创建 Task1 或 SilverFox 相关任务
5. Windows Defender排除：篡改排除路径以规避检测
6. 文件伪装：使用 svchost64.exe、随机进程名（pXDc9LSz.exe）
</code></pre><hr>
<h2 id="二感染链分析">二、感染链分析</h2>
<p>银狐的完整攻击链如下：</p>
<pre tabindex="0"><code>┌─────────────────────────────────────────────────────────────────────┐
│                        银狐感染链                                    │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  阶段1: 初始访问                                                    │
│  ├── 钓鱼邮件（伪装成发票、合同）                                    │
│  ├── 即时通讯（微信/钉钉发送恶意文件）                               │
│  └── 假冒软件更新（财务软件、OA系统）                                │
│                                                                     │
│  阶段2: 执行                                                        │
│  ├── 用户双击恶意附件                                              │
│  ├── 恶意宏代码执行                                                 │
│  └── 社会工程学诱导（&#34;文件恢复指南&#34;等）                              │
│                                                                     │
│  阶段3: 持久化                                                      │
│  ├── 注册表 Run 键写入                                               │
│  ├── WMI 事件订阅（__EventFilter）                                  │
│  ├── 计划任务创建                                                   │
│  └── AppInit_DLLs 注入                                              │
│                                                                     │
│  阶段4: 防御规避                                                    │
│  ├── Windows Defender 排除项篡改                                    │
│  ├── 进程注入（svchost.exe）                                        │
│  ├── 随机进程名生成                                                 │
│  └── 文件伪装（svchost64.exe）                                      │
│                                                                     │
│  阶段5: C2通信                                                      │
│  ├── HTTP/HTTPS 心跳包                                              │
│  ├── DNS 查询（可能使用DGA）                                        │
│  └── 加密通信（TLS/自定义协议）                                     │
│                                                                     │
│  阶段6: 数据窃取                                                    │
│  ├── 浏览器凭证窃取                                                 │
│  ├── 财务软件凭证窃取                                               │
│  └── 即时通讯凭证窃取                                               │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
</code></pre><h3 id="21-各阶段检测要点">2.1 各阶段检测要点</h3>
<table>
	<thead>
			<tr>
					<th>阶段</th>
					<th>检测重点</th>
					<th>检测工具</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>初始访问</td>
					<td>邮件附件、钓鱼链接</td>
					<td>邮件网关、URL过滤</td>
			</tr>
			<tr>
					<td>执行</td>
					<td>可疑进程启动</td>
					<td>EDR、进程监控</td>
			</tr>
			<tr>
					<td>持久化</td>
					<td>注册表、WMI、计划任务</td>
					<td>注册表监控、WMI监控</td>
			</tr>
			<tr>
					<td>防御规避</td>
					<td>Defender排除项、进程注入</td>
					<td>安全配置审计、内存扫描</td>
			</tr>
			<tr>
					<td>C2通信</td>
					<td>异常网络连接、DNS查询</td>
					<td>网络流量分析、DNS监控</td>
			</tr>
			<tr>
					<td>数据窃取</td>
					<td>凭证访问、文件外传</td>
					<td>DLP、凭证监控</td>
			</tr>
	</tbody>
</table>
<hr>
<h2 id="三ioc-列表来自开源工具">三、IOC 列表（来自开源工具）</h2>
<p>以下IOC来自 <a href="https://github.com/zseagate/SilverFox-Scanner">zseagate/SilverFox-Scanner</a> 和 <a href="https://github.com/das-secbox/silverfox_scanner">das-secbox/silverfox_scanner</a> 的源代码。</p>
<h3 id="31-恶意进程名">3.1 恶意进程名</h3>
<pre tabindex="0"><code>foxservice.exe
xfolder32*
svchost.exe          # 注意：正常svchost在System32，异常路径的是恶意
*silverfox*
pXDc9LSz.exe         # 随机生成的进程名示例
pQpfOm.exe           # 随机生成的进程名示例
svchost64.exe        # 伪装进程
</code></pre><h3 id="32-注册表持久化">3.2 注册表持久化</h3>
<pre tabindex="0"><code>HKCU\Software\Microsoft\Windows\CurrentVersion\Run
HKLM\Software\Microsoft\Windows\CurrentVersion\Run
HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs
HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders
</code></pre><h3 id="33-wmi-持久化">3.3 WMI 持久化</h3>
<pre tabindex="0"><code>__EventFilter
__EventConsumer
__FilterToConsumerBinding
Namespace: root\subscription
</code></pre><h3 id="34-计划任务">3.4 计划任务</h3>
<pre tabindex="0"><code>Task1
SilverFox
</code></pre><h3 id="35-恶意文件特征">3.5 恶意文件特征</h3>
<pre tabindex="0"><code>*.silverfox
*silverfox*
foxservice
svchost64.exe
!!!文件恢复指南*
</code></pre><h3 id="36-恶意文件路径">3.6 恶意文件路径</h3>
<pre tabindex="0"><code>C:\ProgramData\xfolder32
C:\Users\Public\Documents\
C:\Users\$USERNAME\AppData\Local\Temp\
</code></pre><h3 id="37-windows-defender-排除项">3.7 Windows Defender 排除项</h3>
<p>银狐常篡改Windows Defender排除路径以规避检测，需检查：</p>
<pre tabindex="0"><code>Get-MpPreference | Select-Object -ExpandProperty ExclusionPath
</code></pre><hr>
<h2 id="四检测脚本">四、检测脚本</h2>
<h3 id="41-windows-检测powershell">4.1 Windows 检测（PowerShell）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-powershell" data-lang="powershell"><span class="line"><span class="cl"><span class="c"># 银狐病毒检测脚本 - Windows版本</span>
</span></span><span class="line"><span class="cl"><span class="c"># 来源: zseagate/SilverFox-Scanner</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;=== 银狐病毒检测 (Windows) ===&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Cyan</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c"># 1. 检查恶意进程</span>
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">[1/6] 检查可疑进程...&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Yellow</span>
</span></span><span class="line"><span class="cl"><span class="nv">$maliciousProcesses</span> <span class="p">=</span> <span class="vm">@</span><span class="p">(</span><span class="s2">&#34;foxservice.exe&#34;</span><span class="p">,</span> <span class="s2">&#34;xfolder32*&#34;</span><span class="p">,</span> <span class="s2">&#34;svchost.exe&#34;</span><span class="p">,</span> <span class="s2">&#34;*silverfox*&#34;</span><span class="p">,</span> <span class="s2">&#34;pXDc9LSz.exe&#34;</span><span class="p">,</span> <span class="s2">&#34;pQpfOm.exe&#34;</span><span class="p">,</span> <span class="s2">&#34;svchost64.exe&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="nv">$foundProcesses</span> <span class="p">=</span> <span class="nb">Get-Process</span> <span class="p">|</span> <span class="nb">Where-Object</span> <span class="p">{</span> <span class="nv">$processName</span> <span class="p">=</span> <span class="nv">$_</span><span class="p">.</span><span class="n">Name</span><span class="p">;</span> <span class="nv">$maliciousProcesses</span> <span class="p">|</span> <span class="nb">Where-Object</span> <span class="p">{</span> <span class="nv">$processName</span> <span class="o">-like</span> <span class="nv">$_</span> <span class="p">}</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="p">(</span><span class="nv">$foundProcesses</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;发现可疑进程:&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Red</span>
</span></span><span class="line"><span class="cl">    <span class="nv">$foundProcesses</span> <span class="p">|</span> <span class="nb">Format-Table</span> <span class="n">Id</span><span class="p">,</span> <span class="n">Name</span><span class="p">,</span> <span class="n">Path</span><span class="p">,</span> <span class="n">StartTime</span> <span class="n">-AutoSize</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;未发现已知恶意进程&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Green</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c"># 2. 检查注册表持久化项</span>
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">[2/6] 检查注册表持久化项...&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Yellow</span>
</span></span><span class="line"><span class="cl"><span class="nv">$runKeys</span> <span class="p">=</span> <span class="vm">@</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;HKCU:\Software\Microsoft\Windows\CurrentVersion\Run&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;HKLM:\Software\Microsoft\Windows\CurrentVersion\Run&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Windows\AppInit_DLLs&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;HKCU:\Software\Microsoft\Windows\CurrentVersion\Explorer\Shell Folders&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">foreach</span> <span class="p">(</span><span class="nv">$key</span> <span class="k">in</span> <span class="nv">$runKeys</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;检查 </span><span class="nv">$key</span><span class="s2">...&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">try</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Get-ItemProperty</span> <span class="n">-Path</span> <span class="nv">$key</span> <span class="n">-ErrorAction</span> <span class="n">Stop</span> <span class="p">|</span> <span class="nb">Select-Object</span> <span class="p">*</span> <span class="p">|</span> <span class="nb">Format-List</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span> <span class="k">catch</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Write-Host</span> <span class="s2">&#34;无法读取该注册表项&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Gray</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c"># 3. 检查WMI事件订阅（银狐常用持久化方式）</span>
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">[3/6] 检查WMI事件订阅...&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Yellow</span>
</span></span><span class="line"><span class="cl"><span class="nb">Get-WmiObject</span> <span class="n">-Namespace</span> <span class="n">root</span><span class="p">\</span><span class="n">subscription</span> <span class="n">-Class</span> <span class="n">__EventFilter</span> <span class="n">-ErrorAction</span> <span class="n">SilentlyContinue</span> <span class="p">|</span> <span class="nb">ForEach-Object</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;发现WMI事件过滤器: </span><span class="p">$(</span><span class="nv">$_</span><span class="p">.</span><span class="n">Name</span><span class="p">)</span><span class="s2">&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Red</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;查询语句: </span><span class="p">$(</span><span class="nv">$_</span><span class="p">.</span><span class="n">Query</span><span class="p">)</span><span class="s2">&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c"># 4. 检查计划任务</span>
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">[4/6] 检查计划任务...&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Yellow</span>
</span></span><span class="line"><span class="cl"><span class="nb">Get-ScheduledTask</span> <span class="p">|</span> <span class="nb">Where-Object</span> <span class="p">{</span> <span class="nv">$_</span><span class="p">.</span><span class="py">TaskName</span> <span class="o">-like</span> <span class="s2">&#34;*Task1*&#34;</span> <span class="o">-or</span> <span class="nv">$_</span><span class="p">.</span><span class="py">Description</span> <span class="o">-like</span> <span class="s2">&#34;*SilverFox*&#34;</span> <span class="p">}</span> <span class="p">|</span> <span class="nb">Format-Table</span> <span class="n">TaskName</span><span class="p">,</span> <span class="n">State</span><span class="p">,</span> <span class="n">Description</span> <span class="n">-AutoSize</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c"># 5. 检查常见恶意文件路径</span>
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">[5/6] 扫描恶意文件路径...&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Yellow</span>
</span></span><span class="line"><span class="cl"><span class="nv">$scanPaths</span> <span class="p">=</span> <span class="vm">@</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;C:\ProgramData\xfolder32&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;C:\Users\Public\Documents\&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="nv">$env:TEMP</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;C:\Users\</span><span class="nv">$env:USERNAME</span><span class="s2">\AppData\Local\Temp&#34;</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">foreach</span> <span class="p">(</span><span class="nv">$path</span> <span class="k">in</span> <span class="nv">$scanPaths</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="p">(</span><span class="nb">Test-Path</span> <span class="nv">$path</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Write-Host</span> <span class="s2">&#34;扫描 </span><span class="nv">$path</span><span class="s2">...&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="nb">Get-ChildItem</span> <span class="n">-Path</span> <span class="nv">$path</span> <span class="n">-Recurse</span> <span class="n">-Force</span> <span class="n">-ErrorAction</span> <span class="n">SilentlyContinue</span> <span class="p">|</span> <span class="nb">Where-Object</span> <span class="p">{</span> <span class="nv">$_</span><span class="p">.</span><span class="py">Name</span> <span class="o">-match</span> <span class="s2">&#34;svchost64\.exe|.*\.silverfox|!!!文件恢复指南.*&#34;</span> <span class="p">}</span> <span class="p">|</span> <span class="nb">ForEach-Object</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">            <span class="nb">Write-Host</span> <span class="s2">&#34;发现可疑文件: </span><span class="p">$(</span><span class="nv">$_</span><span class="p">.</span><span class="n">FullName</span><span class="p">)</span><span class="s2">&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Red</span>
</span></span><span class="line"><span class="cl">        <span class="p">}</span>
</span></span><span class="line"><span class="cl">    <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c"># 6. 检查Windows Defender排除项（银狐常篡改此配置）</span>
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">[6/6] 检查Windows Defender排除路径...&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Yellow</span>
</span></span><span class="line"><span class="cl"><span class="nv">$exclusions</span> <span class="p">=</span> <span class="nb">Get-MpPreference</span> <span class="p">|</span> <span class="nb">Select-Object</span> <span class="n">-ExpandProperty</span> <span class="n">ExclusionPath</span>
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="p">(</span><span class="nv">$exclusions</span><span class="p">)</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;发现排除路径:&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Red</span>
</span></span><span class="line"><span class="cl">    <span class="nv">$exclusions</span> <span class="p">|</span> <span class="nb">ForEach-Object</span> <span class="p">{</span> <span class="nb">Write-Host</span> <span class="nv">$_</span> <span class="p">}</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span> <span class="k">else</span> <span class="p">{</span>
</span></span><span class="line"><span class="cl">    <span class="nb">Write-Host</span> <span class="s2">&#34;未发现异常排除路径&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Green</span>
</span></span><span class="line"><span class="cl"><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">Write-Host</span> <span class="s2">&#34;</span><span class="se">`n</span><span class="s2">排查完成，若发现上述可疑项目，请立即断网并使用专杀工具清理&#34;</span> <span class="n">-ForegroundColor</span> <span class="n">Cyan</span>
</span></span></code></pre></div><h3 id="42-linux-检测bash">4.2 Linux 检测（Bash）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="cp">#!/bin/bash
</span></span></span><span class="line"><span class="cl"><span class="c1"># 银狐病毒检测脚本 - Linux版本</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 来源: zseagate/SilverFox-Scanner</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\033[36m=== 银狐病毒检测 (Linux) ===\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 1. 检查可疑进程</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[1/5] 检查可疑进程...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">ps aux <span class="p">|</span> grep -iE <span class="s2">&#34;silverfox|foxservice|svchost|minerd|xmrig&#34;</span> <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> -eq <span class="m">0</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl">    <span class="nb">echo</span> -e <span class="s2">&#34;\033[31m发现可疑进程，请重点检查上述进程\033[0m&#34;</span>
</span></span><span class="line"><span class="cl"><span class="k">fi</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 2. 检查开机启动项</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[2/5] 检查开机启动项...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">systemctl list-unit-files --type<span class="o">=</span>service <span class="p">|</span> grep -iE <span class="s2">&#34;silverfox|malware|unknown&#34;</span>
</span></span><span class="line"><span class="cl">crontab -l 2&gt;/dev/null <span class="p">|</span> grep -iE <span class="s2">&#34;curl|wget|bash|python.*http&#34;</span>
</span></span><span class="line"><span class="cl">cat /etc/crontab <span class="p">|</span> grep -iE <span class="s2">&#34;curl|wget|bash|python.*http&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3. 检查恶意文件</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[3/5] 扫描常见恶意路径...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">scan_dirs</span><span class="o">=(</span><span class="s2">&#34;/tmp&#34;</span> <span class="s2">&#34;/var/tmp&#34;</span> <span class="s2">&#34;/dev/shm&#34;</span> <span class="s2">&#34;/root&#34;</span> <span class="s2">&#34;/home&#34;</span><span class="o">)</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> dir in <span class="s2">&#34;</span><span class="si">${</span><span class="nv">scan_dirs</span><span class="p">[@]</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl">    <span class="nb">echo</span> <span class="s2">&#34;扫描 </span><span class="nv">$dir</span><span class="s2">...&#34;</span>
</span></span><span class="line"><span class="cl">    find <span class="s2">&#34;</span><span class="nv">$dir</span><span class="s2">&#34;</span> -type f <span class="se">\(</span> -name <span class="s2">&#34;*.silverfox&#34;</span> -o -name <span class="s2">&#34;*silverfox*&#34;</span> -o -name <span class="s2">&#34;foxservice&#34;</span> <span class="se">\)</span> 2&gt;/dev/null
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 4. 检查网络连接</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[4/5] 检查可疑网络连接...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">netstat -antp 2&gt;/dev/null <span class="p">|</span> grep -iE <span class="s2">&#34;estab|listen&#34;</span> <span class="p">|</span> grep -v <span class="s2">&#34;:22\|:80\|:443&#34;</span> <span class="p">|</span> grep -v <span class="s2">&#34;127.0.0.1&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 5. 检查最近修改的文件</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[5/5] 检查最近24小时修改的可执行文件...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">find / -type f -mtime -1 -perm /u+x 2&gt;/dev/null <span class="p">|</span> grep -vE <span class="s2">&#34;/bin|/sbin|/usr/bin|/usr/sbin&#34;</span> <span class="p">|</span> head -20
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[36m排查完成，若发现可疑项请及时隔离并清理\033[0m&#34;</span>
</span></span></code></pre></div><h3 id="43-macos-检测bash">4.3 macOS 检测（Bash）</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="cp">#!/bin/bash
</span></span></span><span class="line"><span class="cl"><span class="c1"># 银狐病毒检测脚本 - macOS版本</span>
</span></span><span class="line"><span class="cl"><span class="c1"># 来源: zseagate/SilverFox-Scanner</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\033[36m=== 银狐病毒检测 (macOS) ===\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 1. 检查可疑进程</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[1/5] 检查可疑进程...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">ps aux <span class="p">|</span> grep -iE <span class="s2">&#34;silverfox|foxservice|svchost&#34;</span> <span class="p">|</span> grep -v grep
</span></span><span class="line"><span class="cl"><span class="k">if</span> <span class="o">[</span> <span class="nv">$?</span> -eq <span class="m">0</span> <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
</span></span><span class="line"><span class="cl">    <span class="nb">echo</span> -e <span class="s2">&#34;\033[31m发现可疑进程，请重点检查上述进程\033[0m&#34;</span>
</span></span><span class="line"><span class="cl"><span class="k">fi</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 2. 检查启动项与LoginHook</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[2/5] 检查开机启动项...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">launchctl list <span class="p">|</span> grep -iE <span class="s2">&#34;silverfox|unknown|malware&#34;</span>
</span></span><span class="line"><span class="cl">defaults <span class="nb">read</span> com.apple.loginwindow LoginHook 2&gt;/dev/null
</span></span><span class="line"><span class="cl">defaults <span class="nb">read</span> com.apple.loginwindow LogoutHook 2&gt;/dev/null
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 3. 检查LaunchAgents/LaunchDaemons</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[3/5] 检查Launch配置...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">launch_dirs</span><span class="o">=(</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;/Library/LaunchAgents&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;/Library/LaunchDaemons&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;</span><span class="nv">$HOME</span><span class="s2">/Library/LaunchAgents&#34;</span>
</span></span><span class="line"><span class="cl"><span class="o">)</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> dir in <span class="s2">&#34;</span><span class="si">${</span><span class="nv">launch_dirs</span><span class="p">[@]</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl">    <span class="nb">echo</span> <span class="s2">&#34;检查 </span><span class="nv">$dir</span><span class="s2">...&#34;</span>
</span></span><span class="line"><span class="cl">    ls -la <span class="s2">&#34;</span><span class="nv">$dir</span><span class="s2">&#34;</span> <span class="p">|</span> grep -iE <span class="s2">&#34;silverfox|foxservice|unknown&#34;</span>
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 4. 扫描恶意文件</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[4/5] 扫描恶意文件...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl"><span class="nv">scan_dirs</span><span class="o">=(</span><span class="s2">&#34;/tmp&#34;</span> <span class="s2">&#34;/var/tmp&#34;</span> <span class="s2">&#34;</span><span class="nv">$HOME</span><span class="s2">/Downloads&#34;</span> <span class="s2">&#34;</span><span class="nv">$HOME</span><span class="s2">/Documents&#34;</span> <span class="s2">&#34;/Applications&#34;</span><span class="o">)</span>
</span></span><span class="line"><span class="cl"><span class="k">for</span> dir in <span class="s2">&#34;</span><span class="si">${</span><span class="nv">scan_dirs</span><span class="p">[@]</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">;</span> <span class="k">do</span>
</span></span><span class="line"><span class="cl">    find <span class="s2">&#34;</span><span class="nv">$dir</span><span class="s2">&#34;</span> -type f <span class="se">\(</span> -name <span class="s2">&#34;*.silverfox&#34;</span> -o -name <span class="s2">&#34;*silverfox*&#34;</span> -o -name <span class="s2">&#34;SilverFox.app&#34;</span> <span class="se">\)</span> 2&gt;/dev/null
</span></span><span class="line"><span class="cl"><span class="k">done</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 5. 检查网络连接</span>
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[33m[5/5] 检查可疑网络连接...\033[0m&#34;</span>
</span></span><span class="line"><span class="cl">lsof -i -P <span class="p">|</span> grep -iE <span class="s2">&#34;listen|established&#34;</span> <span class="p">|</span> grep -v <span class="s2">&#34;:22\|:80\|:443&#34;</span> <span class="p">|</span> grep -v <span class="s2">&#34;127.0.0.1&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nb">echo</span> -e <span class="s2">&#34;\n\033[36m排查完成，若发现可疑项建议使用专业安全工具进一步扫描\033[0m&#34;</span>
</span></span></code></pre></div><hr>
<h2 id="五yara-规则整合版">五、YARA 规则（整合版）</h2>
<p>以下YARA规则整合了进程名、WMI、文件特征、Go语言特征和注册表持久化检测，可直接使用。</p>
<h3 id="51-银狐病毒完整yara规则">5.1 银狐病毒完整YARA规则</h3>
<pre tabindex="0"><code class="language-yara" data-lang="yara">rule SilverFox_Complete {
    meta:
        description = &#34;银狐病毒完整检测规则（进程名 + WMI + 文件特征 + Go特征 + 注册表）&#34;
        author = &#34;Based on zseagate/SilverFox-Scanner&#34;
        date = &#34;2026-05-25&#34;
        reference = &#34;https://github.com/zseagate/SilverFox-Scanner&#34;
        version = &#34;1.0&#34;
    
    strings:
        // === 进程名特征 ===
        $proc1 = &#34;foxservice.exe&#34;
        $proc2 = &#34;xfolder32&#34;
        $proc3 = &#34;silverfox&#34; nocase
        $proc4 = &#34;svchost64.exe&#34;
        $proc5 = &#34;pXDc9LSz.exe&#34;
        $proc6 = &#34;pQpfOm.exe&#34;
        
        // === WMI持久化特征 ===
        $wmi1 = &#34;__EventFilter&#34;
        $wmi2 = &#34;__EventConsumer&#34;
        $wmi3 = &#34;__FilterToConsumerBinding&#34;
        $wmi4 = &#34;root\\subscription&#34;
        
        // === 文件特征 ===
        $ext1 = &#34;.silverfox&#34;
        $name1 = &#34;foxservice&#34;
        $name2 = &#34;svchost64.exe&#34;
        $name3 = &#34;!!!文件恢复指南&#34;
        $name4 = &#34;xfolder32&#34;
        
        // === Go语言特征 ===
        $go1 = &#34;go.buildinfo&#34;
        $go2 = &#34;runtime&#34;
        $go3 = &#34;GOTRACEBACK&#34;
        
        // === 注册表特征 ===
        $reg1 = &#34;CurrentVersion\\Run&#34;
        $reg2 = &#34;AppInit_DLLs&#34;
        $reg3 = &#34;Shell Folders&#34;
    
    condition:
        // 高置信度：银狐特定字符串 + Go特征
        any of ($proc*) or any of ($name*) or any of ($wmi*) or 
        $go1 or ($go2 and any of ($reg*))
}

rule SilverFox_Process {
    meta:
        description = &#34;银狐病毒进程名检测&#34;
        author = &#34;Based on zseagate/SilverFox-Scanner&#34;
        date = &#34;2026-05-25&#34;
    
    strings:
        $proc1 = &#34;foxservice.exe&#34;
        $proc2 = &#34;xfolder32&#34;
        $proc3 = &#34;silverfox&#34; nocase
        $proc4 = &#34;svchost64.exe&#34;
        $proc5 = &#34;pXDc9LSz.exe&#34;
        $proc6 = &#34;pQpfOm.exe&#34;
    
    condition:
        any of them
}

rule SilverFox_WMI {
    meta:
        description = &#34;银狐 WMI 持久化检测&#34;
        author = &#34;Based on zseagate/SilverFox-Scanner&#34;
        date = &#34;2026-05-25&#34;
    
    strings:
        $wmi1 = &#34;__EventFilter&#34;
        $wmi2 = &#34;__EventConsumer&#34;
        $wmi3 = &#34;__FilterToConsumerBinding&#34;
        $wmi4 = &#34;root\\subscription&#34;
    
    condition:
        any of them
}

rule SilverFox_File {
    meta:
        description = &#34;银狐病毒文件特征检测&#34;
        author = &#34;Based on zseagate/SilverFox-Scanner&#34;
        date = &#34;2026-05-25&#34;
    
    strings:
        $ext1 = &#34;.silverfox&#34;
        $name1 = &#34;foxservice&#34;
        $name2 = &#34;svchost64.exe&#34;
        $name3 = &#34;!!!文件恢复指南&#34;
        $name4 = &#34;xfolder32&#34;
    
    condition:
        any of them
}

rule SilverFox_GoBinary {
    meta:
        description = &#34;银狐 Go语言二进制特征检测&#34;
        author = &#34;Based on zseagate/SilverFox-Scanner&#34;
        date = &#34;2026-05-25&#34;
    
    strings:
        // Go运行时特征
        $go1 = &#34;go.buildinfo&#34;
        $go2 = &#34;runtime&#34;
        $go3 = &#34;GOTRACEBACK&#34;
        
        // 银狐特定字符串
        $sf1 = &#34;foxservice&#34; nocase
        $sf2 = &#34;silverfox&#34; nocase
        $sf3 = &#34;xfolder&#34; nocase
    
    condition:
        $go1 or ($go2 and any of ($sf1, $sf2, $sf3))
}

rule SilverFox_Registry {
    meta:
        description = &#34;银狐注册表持久化检测&#34;
        author = &#34;Based on zseagate/SilverFox-Scanner&#34;
        date = &#34;2026-05-25&#34;
    
    strings:
        $reg1 = &#34;CurrentVersion\\Run&#34;
        $reg2 = &#34;AppInit_DLLs&#34;
        $reg3 = &#34;Shell Folders&#34;
    
    condition:
        any of them
}
</code></pre><h3 id="52-使用示例">5.2 使用示例</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl"><span class="c1"># 扫描整个系统</span>
</span></span><span class="line"><span class="cl">yara -r silverfox.yar /
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 扫描特定目录</span>
</span></span><span class="line"><span class="cl">yara silverfox.yar /tmp
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># 扫描进程内存（需要libyara）</span>
</span></span><span class="line"><span class="cl">yara -m silverfox.yar /proc/&lt;pid&gt;/mem
</span></span></code></pre></div><hr>
<h2 id="六检测流程示例">六、检测流程示例</h2>
<h3 id="61-企业环境检测流程">6.1 企业环境检测流程</h3>
<pre tabindex="0"><code>步骤1: 网络隔离
├── 发现可疑主机后，立即断网
└── 防止C2通信和数据外传

步骤2: 初步扫描
├── 运行银狐检测脚本
├── 检查恶意进程、注册表、WMI、计划任务
└── 记录所有可疑项

步骤3: 深度分析
├── 对可疑进程进行内存分析
├── 提取C2通信特征
└── 分析持久化机制

步骤4: 清理与恢复
├── 使用专杀工具清理
├── 恢复Windows Defender配置
├── 重置注册表和计划任务
└── 修改所有凭证

步骤5: 溯源与报告
├── 分析感染来源
├── 记录IOC
└── 提交威胁情报
</code></pre><h3 id="62-个人用户检测流程">6.2 个人用户检测流程</h3>
<pre tabindex="0"><code>步骤1: 下载专杀工具
├── 火绒银狐专杀: https://down5.huorong.cn/tools/Hrkill-SilverFox.exe
├── 深信服专杀: https://download.sangfor.com.cn/download/product/edr/antivirus_tool/sfakiller_x64.exe
└── das-secbox银狐专杀: https://github.com/das-secbox/silverfox_scanner/releases

步骤2: 运行扫描
├── 全盘扫描
├── 等待结果
└── 清理发现的威胁

步骤3: 手动检查
├── 检查任务管理器是否有可疑进程
├── 检查启动项是否有异常
└── 检查浏览器是否有异常扩展

步骤4: 修改凭证
├── 修改所有重要账户密码
├── 检查浏览器保存的密码
└── 启用双因素认证
</code></pre><hr>
<h2 id="七开源检测工具">七、开源检测工具</h2>
<table>
	<thead>
			<tr>
					<th>工具</th>
					<th>作者</th>
					<th>特点</th>
					<th>地址</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>silverfox_scanner</td>
					<td>大安全</td>
					<td>查杀库30分钟自动更新</td>
					<td><a href="https://github.com/das-secbox/silverfox_scanner">GitHub</a></td>
			</tr>
			<tr>
					<td>SilverFox-Scanner</td>
					<td>zseagate</td>
					<td>跨平台（Win/Linux/macOS）</td>
					<td><a href="https://github.com/zseagate/SilverFox-Scanner">GitHub</a></td>
			</tr>
			<tr>
					<td>火绒银狐专杀</td>
					<td>火绒安全</td>
					<td>免费专杀工具</td>
					<td><a href="https://down5.huorong.cn/tools/Hrkill-SilverFox.exe">下载</a></td>
			</tr>
			<tr>
					<td>深信服专杀</td>
					<td>深信服</td>
					<td>免费专杀工具</td>
					<td><a href="https://download.sangfor.com.cn/download/product/edr/antivirus_tool/sfakiller_x64.exe">下载</a></td>
			</tr>
	</tbody>
</table>
<hr>
<h2 id="八局限性说明">八、局限性说明</h2>
<table>
	<thead>
			<tr>
					<th>维度</th>
					<th>状态</th>
					<th>说明</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td><strong>IOC来源</strong></td>
					<td>✅ 已验证</td>
					<td>来自开源检测工具源代码</td>
			</tr>
			<tr>
					<td><strong>最新IOC</strong></td>
					<td>⚠️ 需更新</td>
					<td>从 das-secbox 查杀库获取（30分钟更新）</td>
			</tr>
			<tr>
					<td><strong>样本分析</strong></td>
					<td>❌ 无</td>
					<td>需要获取样本在隔离环境分析</td>
			</tr>
			<tr>
					<td><strong>C2溯源</strong></td>
					<td>❌ 无</td>
					<td>需要专业安全团队</td>
			</tr>
			<tr>
					<td><strong>Go特征检测</strong></td>
					<td>⚠️ 部分</td>
					<td>YARA规则基于公开特征，可能不完整</td>
			</tr>
	</tbody>
</table>
<blockquote>
<p><strong>建议</strong>: 下载 <a href="https://github.com/das-secbox/silverfox_scanner/releases">das-secbox/silverfox_scanner</a> 获取最新查杀库。</p>
</blockquote>
<hr>
<h2 id="九参考资源">九、参考资源</h2>
<ul>
<li><a href="https://ti.qq.com/">腾讯安全：银狐木马家族分析报告</a></li>
<li><a href="https://ti.360.cn/">360威胁情报中心</a></li>
<li><a href="https://x.threatbook.com/">微步在线威胁情报</a></li>
<li><a href="https://www.virustotal.com/">VirusTotal</a></li>
<li><a href="https://otx.alienvault.com/">AlienVault OTX</a></li>
<li><a href="https://github.com/das-secbox/silverfox_scanner">das-secbox/silverfox_scanner</a></li>
<li><a href="https://github.com/zseagate/SilverFox-Scanner">zseagate/SilverFox-Scanner</a></li>
</ul>
<hr>
<p><em>本文IOC来自开源检测工具，最新IOC请从官方查杀工具获取。</em></p>
]]></content:encoded></item></channel></rss>