Fashion Editing with Adversarial Parsing Learning
该文为人物的时尚编辑
网络包含三个部分:
-
Free-form Parsing Network
-
Parsing-aware Inpainting Network
-
Attention Normalization Layers
Free-form Parsing Network
- 给定不完整的人体语义分析图以及任意的草图和笔画,能够合成完整的人体语义分析图。
网络结构 : U-net
输入:
-
an incomplete parsing map,
2. a binary sketch that describes the structure of the removed region 3. a noise sampled from the Gaussian distribution, 4. sparse color strokes 5. a mask.
注意:相同的incomplete parsing map和不同的sketch和strokes能够合成不同的parsing map,这意味这parsing generation model是可控的。
Parsing-aware Inpainting Network
网络结构:
输入:
- Incomplete image + composed mask -> partial conv encoder
- Synthesized parsing -> standard conv encoder
Attention Normalization Layers
网络结构:
数学表达式:
其中
$\alpha$:leaned attention map
$\beta$ : bias
Sketch+Color+Noise ->Conv->Conv ->Sigmoid-> leaned attention map
Sketch+Color+Noise ->Conv->Conv-> bias
Attention normalization layers有效性的解释:
The effectiveness of ANLs is due to their inherent characteristics. Similar to SPADE, ANLs also can avoid washing away semantic information in activations, since the attention map and bias are spatially-varying. Moreover, the multi-scale ANLs can not only adapt the various scales of activations during upsampling but also extract coarse-to-fine semantic information from external data, which guide the fashion editing more precisely.
- avoid washing away semantic information in activations - > attention map and bias are spatially-varying
- can not only adapt the various scales of activations during upsampling but also extract coarse-to-fine semantic information from external data ->多尺寸的激活值,以及从“粗到细”的语义信息
与SPADE的比较: