Fashion Editing with Adversarial Parsing Learning

shilongshen included in 深度学习论文阅读笔记

2020-11-15 201 words One minute

Contents

该文为人物的时尚编辑

网络包含三个部分：

Free-form Parsing Network
Parsing-aware Inpainting Network
Attention Normalization Layers

Free-form Parsing Network

给定不完整的人体语义分析图以及任意的草图和笔画，能够合成完整的人体语义分析图。

网络结构： U-net

输入：

an incomplete parsing map,

           2.   a binary sketch that describes the structure of the removed region
           3.   a noise sampled from the Gaussian distribution,
           4.   sparse color strokes 
           5.   a mask.

注意：相同的incomplete parsing map和不同的sketch和strokes能够合成不同的parsing map,这意味这parsing generation model是可控的。

Parsing-aware Inpainting Network

网络结构：

输入：

Incomplete image + composed mask -> partial conv encoder
Synthesized parsing -> standard conv encoder

Attention Normalization Layers

网络结构：

数学表达式：

其中

$\alpha$：leaned attention map

$\beta$ : bias

Sketch+Color+Noise ->Conv->Conv ->Sigmoid-> leaned attention map

Sketch+Color+Noise ->Conv->Conv-> bias

Attention normalization layers有效性的解释：

The effectiveness of ANLs is due to their inherent characteristics. Similar to SPADE, ANLs also can avoid washing away semantic information in activations, since the attention map and bias are spatially-varying. Moreover, the multi-scale ANLs can not only adapt the various scales of activations during upsampling but also extract coarse-to-fine semantic information from external data, which guide the fashion editing more precisely.

avoid washing away semantic information in activations - > attention map and bias are spatially-varying
can not only adapt the various scales of activations during upsampling but also extract coarse-to-fine semantic information from external data ->多尺寸的激活值，以及从“粗到细”的语义信息

与SPADE的比较：