- ½¹µãÊÖÒÕ
- ÒÔÔ´´ÊÖÒÕϵͳΪ»ù±¾£¬£¬£¬£¬£¬£¬£¬SenseCoreÉÌÌÀAI´ó×°ÖÃΪ½¹µã»ù×ù£¬£¬£¬£¬£¬£¬£¬½á¹¹¶àÁìÓò¡¢¶àÆ«ÏòÇ°ÑØÑо¿£¬£¬£¬£¬£¬£¬£¬
¿ìËÙÂòͨAIÔÚ¸÷¸ö±ÊÖ±³¡¾°ÖеÄÓ¦Ó㬣¬£¬£¬£¬£¬£¬ÏòÐÐÒµ¸³ÄÜ¡£¡£¡£¡£¡£
ICCV 2021 _ Talk-to-Edit£ºÍ¨¹ý¶Ô»°ÊµÏÖ¸ßϸÁ£¶ÈÈËÁ³±à¼
Talk-to-Edit: Fine-Grained Facial Editing via Dialog
Yuming Jiang1 Ziqi Huang1 Xingang pan2 Chen Change Loy1 Ziwei Liu1 1S-Lab Nanyang Technological University 2The Chinese University of Hong Kong
{yuming002, hu0007qi, ccloy, ziwei.liu}@ntu.edu.sg px117@ie.cuhk.edu.hk
Part 1 »°Î´¼¸Ëµ£¬£¬£¬£¬£¬£¬£¬ÏÈ¿´Í¼
Óû§Í¨¹ýºÍϵͳµÄ¶Ô»°Íê³ÉÈËÁ³±à¼£º

¶ÔÈËÁ³µÄ¸÷¸öÌØÕ÷¾ÙÐб༣º

±à¼ÁÐλ˧¸çÓñÈË£¨Óõ½ÁËGAN inversion£©£º

Part 2: ÒªÁ켰Ч¹û
ÒÔÉÏÊÇÔõÑùʵÏÖµÄÄØ£¿£¿£¿£¿£¿£¿Talk-to-EditµÄpipelineÈçÏÂͼËùʾ£º

ÎÒÃÇÓà Semantic Field À´ÊµÏÖ¶ÔÈËÁ³ÌØÕ÷Ò»Á¬ÇÒ¸ßϸÁ£¶È¿É¿ØµÄ±à¼¡£¡£¡£¡£¡£¶Ô»°¹¦Ð§ÔòÓÉ Language EncoderºÍTalkÄ£¿£¿£¿£¿£¿£¿éÀ´ÊµÏÖ¡£¡£¡£¡£¡£½ÓÏÂÀ´ÎÒÃÇÚ¹Ê͸÷¸öÄ£¿£¿£¿£¿£¿£¿éµÄÊÂÇéÔÀíºÍЧ¹û¡£¡£¡£¡£¡£
2.1 Semantic Field
Åä¾°£ºGAN[1, 2]¿ÉÒÔ»ùÓÚÒþ¿Õ¼äÖвî±ðµÄÒþÏòÁ¿ÌìÉú²î±ðµÄͼƬ¡£¡£¡£¡£¡£»£»£»£»£»ùÓÚÒþ¿Õ¼äµÄͼƬ±à¼ÒªÁì[3, 4, 5, 6, 7]ʹÓÃԤѵÁ·µÄGAN¼°ÆäÒþ¿Õ¼ä£¬£¬£¬£¬£¬£¬£¬ÓпØÖƵظıäÒ»ÕÅͼƬ¶ÔÓ¦µÄÒþÏòÁ¿£¬£¬£¬£¬£¬£¬£¬´Ó¶ø¼ä½ÓʵÏÖ¶ÔͼƬµÄ±à¼¡£¡£¡£¡£¡£È»¶øÕâЩҪÁì¼ÙÉèÔÚÒþ¿Õ¼äÖÐÑØ×Åij¸öÆ«Ïò¡°×ßÖ±Ïß¡±¾Í¿ÉÒÔʵÏÖ¶ÔÒ»ÕÅÈËÁ³µÄÄ³Ò»ÌØÕ÷µÄ±à¼ (ÈçÏÂͼ(b)ÖÐ×ØÉ«Â·¾¶(1))¡£¡£¡£¡£¡£

¹þ¹þ(haha)ÌåÓýÒªÁìÅ׿ªÁË¡°×ßÖ±Ïß¡±ÕâÒ»¼ÙÉ裬£¬£¬£¬£¬£¬£¬ÔÚ¡°×ß¶¯¡±Àú³ÌÖÐһֱƾ֤ÏÖÔÚµÄÒþÏòÁ¿Ñ°ÕÒÄ¿½ñ×îÓŵÄǰ½øÆ«Ïò (ÈçÉÏͼ(b)ÖÐÐþɫ·¾¶(2))¡£¡£¡£¡£¡£ÓÚÊÇ£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇÔÚÒþ¿Õ¼äÖй¹½¨Ò»¸öÏòÁ¿³¡À´ÌåÏÖÿ¸öÒþÏòÁ¿µÄ×î¼Ñ¡°Ç°½øÆ«Ïò¡±£¬£¬£¬£¬£¬£¬£¬ÑØ×ÅÄ¿½ñÒþÏòÁ¿µÄ×î¼Ñ¡°Ç°½øÆ«Ïò¡±Òƶ¯ÒþÏòÁ¿£¬£¬£¬£¬£¬£¬£¬´Ó¶ø¸Ä±äͼƬµÄijһ¸öÓïÒåÌØÕ÷¡£¡£¡£¡£¡£ÎÒÃdzÆÕâ¸öÏòÁ¿³¡ÎªÓïÒ峡£¬£¬£¬£¬£¬£¬£¬¼´Semantic Field¡£¡£¡£¡£¡£¹þ¹þ(haha)ÌåÓý±à¼·½·¨µÈ¼ÛÓÚÑØ×ÅÏòÁ¿³¡µÄ³¡Ïß(field line)£¬£¬£¬£¬£¬£¬£¬ÏòÊÆ(potential)ÔöÌíµÃ×î¿ìµÄÆ«ÏòÒÆ¶¯¡£¡£¡£¡£¡£ÕâÀïµÄÊÆÖ¸µÄ¾ÍÊÇÄ³Ò»ÌØÕ÷µÄˮƽ£¬£¬£¬£¬£¬£¬£¬ºÃ±ÈÔڱ༡°Áõº£¡±ÕâÒ»ÌØÕ÷ʱ£¬£¬£¬£¬£¬£¬£¬ÒþÏòÁ¿ÑØ×ų¡Ïߣ¬£¬£¬£¬£¬£¬£¬ÏòÁõº£±ä³¤×î¿ìµÄÆ«ÏòÒÆ¶¯(ÈçÉÏͼ(b)ÖÐÐþɫ·¾¶(2))¡£¡£¡£¡£¡£
Semantic Field¾ßÓÐÁ½¸öÌØÕ÷£º1) ¶ÔͳһСÎÒ˽¼ÒÀ´Ëµ£¬£¬£¬£¬£¬£¬£¬Ò»Ö±¸Ä±äijһ¸öÊôÐÔ£¬£¬£¬£¬£¬£¬£¬ÐèÒªµÄ¡°×î¼Ñǰ½øÆ«Ïò¡±ÊÇһֱת±äµÄ¡£¡£¡£¡£¡£2£©ÔÚ±à¼Í³Ò»¸öÊôÐÔʱ£¬£¬£¬£¬£¬£¬£¬¹ØÓÚ²î±ðÈË£¬£¬£¬£¬£¬£¬£¬¶ÔÓ¦µÄ¡°×î¼Ñǰ½øÆ«Ïò¡±Ò²ÊDzî±ðµÄ¡£¡£¡£¡£¡£ÎÒÃÇÓÃÒ»¸öÉñ¾ÍøÂçÀ´Ä£Äâ Semantic Field£¬£¬£¬£¬£¬£¬£¬ÓÃÈçÉÏͼ(a)ËùʾµÄÒªÁìѵÁ· Semantic Field¡£¡£¡£¡£¡£¸ü¶àʵÏÖϸ½ÚÇë²Î¿¼ÂÛÎĺʹúÂë¡£¡£¡£¡£¡£
ÈçÏÂ±í£¬£¬£¬£¬£¬£¬£¬ÊµÑéЧ¹ûÅú×¢£¬£¬£¬£¬£¬£¬£¬Ïà¹ØÓÚÓá°×ßÖ±Ïß¡±¼ÙÉèµÄbaselines£¬£¬£¬£¬£¬£¬£¬¹þ¹þ(haha)ÌåÓýÒªÁì¿ÉÒÔÔÚÈËÁ³±à¼µÄÀú³ÌÖиüºÃµÄ±£´æÕâСÎÒ˽¼ÒµÄÉí·ÝÌØÕ÷£¬£¬£¬£¬£¬£¬£¬²¢ÇÒÔÚ±à¼Ä³Ò»¸öÓïÒåÌØÕ÷ʱïÔÌ¶ÔÆäËûÎÞ¹ØÓïÒåÌØÕ÷µÄ¸Ä±ä¡£¡£¡£¡£¡£

¿´¿´Í¼£¬£¬£¬£¬£¬£¬£¬±ÈÕÕÒ²ºÜÏÔ×Å£º

2.2 Language Encoder ºÍ Talk Module
ΪÁ˸øÓû§Ìṩ¸ü±ã½ÝÖ±¹ÛµÄ½»»¥·½·¨£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇʹÓöԻ°µÄ·½·¨ÈÃÓû§ÊµÏֱ༡£¡£¡£¡£¡£Talk-to-EditÓÃÒ»¸ö»ùÓÚLSTMµÄLanguage EncoderÀ´Ã÷È·Óû§µÄ±à¼ÒªÇ󣬣¬£¬£¬£¬£¬£¬²¢½«±àÂëºóµÄ±à¼ÒªÇóת´ï¸øSemantic Field´Ó¶øÖ¸µ¼±à¼¡£¡£¡£¡£¡£TalkÄ£¿£¿£¿£¿£¿£¿é¿ÉÒÔÔÚÿÂֱ༺óÏòÓû§È·ÈÏϸÁ£¶ÈµÄ±à¼Ë®Æ½£¬£¬£¬£¬£¬£¬£¬ºÃ±ÈÏòÓû§È·ÈÏÏÖÔÚµÄЦÈÝÊÇ·ñǡǡºÏÊÊ£¬£¬£¬£¬£¬£¬£¬ÊÇ·ñÐèÒªÔÙ¶àÒ»µµ¡£¡£¡£¡£¡£Talk Ä£¿£¿£¿£¿£¿£¿éÒ²¿ÉÒÔΪÓû§ÌṩÆäËû±à¼½¨Ò飬£¬£¬£¬£¬£¬£¬ºÃ±Èϵͳ·¢Ã÷Óû§´ÓδʵÑé¹ý±à¼ÑÛ¾µÕâ¸öÌØÕ÷£¬£¬£¬£¬£¬£¬£¬ÓÚÊÇѯÎÊÓû§ÊÇ·ñÏëÊÔÒ»ÊÔ¸øÕÕÆ¬¼Ó¸öÑÛ¾µ¡£¡£¡£¡£¡£
Part 3: CelebA-DialogÊý¾Ý¼¯

»ùÓÚCelebA [8] Êý¾Ý¼¯£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇΪÑо¿ÉçÇøÌṩÁËCelebA-DialogÊý¾Ý¼¯£º
£¨1£©ÎÒÃÇÌṩÁËÿÕÅͼƬµÄ¸ßϸÁ£¶ÈÌØÕ÷±ê×¢¡£¡£¡£¡£¡£ÈçÉÏͼËùʾ£¬£¬£¬£¬£¬£¬£¬Æ¾Ö¤Ð¦ÈݵÄѤÀÃˮƽ£¬£¬£¬£¬£¬£¬£¬¡°Ð¦ÈÝ¡±Õâ¸öÓïÒåÌØÕ÷±»·ÖΪ6µµ¡£¡£¡£¡£¡£CelebA-Dialog׼ȷµØ±ê×¢ÁËÿÕÅͼƬÖеġ°Ð¦¡±ÊôÓÚ6¸öÆ·¼¶ÖеÄÄÄÒ»¸ö¡£¡£¡£¡£¡£
£¨2£©ÎÒÃÇÌṩ¸»ºñµÄµÄ×ÔÈ»ÓïÑÔÐÎò£¬£¬£¬£¬£¬£¬£¬°üÀ¨Ã¿ÕÅͼƬ¸÷¸öÓïÒåÌØÕ÷µÄ¸ßϸÁ£¶È×ÔÈ»ÓïÑÔ˵Ã÷ (image captions)£¬£¬£¬£¬£¬£¬£¬ÒÔ¼°Ò»¾äͼƬ±à¼µÄÓû§ÒªÇó£¨user request£©¡£¡£¡£¡£¡£
CelebA-Dialog¿ÉÒÔΪ¶àÖÖʹÃüÌṩ¼àÊÓ£¬£¬£¬£¬£¬£¬£¬ÀýÈç¸ßϸÁ£¶ÈÈËÁ³ÌØÕ÷ʶ±ð£¬£¬£¬£¬£¬£¬£¬»ùÓÚ×ÔÈ»ÓïÑÔµÄÈËÁ³ÌìÉúºÍ±à¼µÈ¡£¡£¡£¡£¡£
ÔÚTalk-to-EditÕâ¸öÊÂÇéÖУ¬£¬£¬£¬£¬£¬£¬ÎÒÃÇʹÓÃCelebA-DialogµÄ¸ßϸÁ£¶ÈÌØÕ÷±êעѵÁ·ÁËÒ»¸ö¸ßϸÁ£¶Èµ½ÈËÁ³ÌØÕ÷Õ¹ÍûÆ÷£¬£¬£¬£¬£¬£¬£¬´Ó¶øÎªSemantic FieldµÄѵÁ·Ìṩ¸ßϸÁ£¶ÈµÄ¼àÊÓ¡£¡£¡£¡£¡£
Part 4: ×ܽá
(1) ±¾ÊÂÇéÌá³öÁËÒ»¸ö»ùÓÚ¶Ô»°µÄ£¬£¬£¬£¬£¬£¬£¬¸ßϸÁ£¶ÈµÄÈËÁ³±à¼ÏµÍ³: Talk-to-Edit¡£¡£¡£¡£¡£
(2) ÎÒÃÇÌá³öÁË ¡°Semantic Field¡±£¬£¬£¬£¬£¬£¬£¬¼´ÔÚGANÒþ¿Õ¼äÖÐѧϰһ¸öÓïÒ峡£¬£¬£¬£¬£¬£¬£¬Í¨¹ýÔÚÒþ¿Õ¼äÖÐÑØ×ų¡Ïß¡°ÐÐ×ß¡±£¬£¬£¬£¬£¬£¬£¬´Ó¶øÊµÏÖÒ»Á¬ÇÒϸÁ£µÄÈËÁ³ÌØÕ÷±à¼¡£¡£¡£¡£¡£
(3) ÎÒÃÇΪÑо¿ÉçÇøÐ¢Ë³ÁËÒ»¸ö´ó¹æÄ£µÄÊý¾Ý¼¯ CelebA-Dialog¡£¡£¡£¡£¡£ÎÒÃÇÐÅÍÐËü¿ÉÒԺܺõØ×ÊÖúµ½Î´À´¸ßϸÁ£¶ÈÈËÁ³±à¼µÄʹÃüÒÔ¼°×ÔÈ»ÓïÑÔÇý¶¯µÄÊÓ¾õʹÃü¡£¡£¡£¡£¡£
ÔÙÀ´¿´¿´Talk-to-EditµÄ¸ü¶àÓÅÒìÌåÏÖ°É£º



Reference:
[1] Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401¨C4410, 2019. 1, 2
[2] Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten, Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. In CVPR, pages 8110¨C8119, 2020. 1, 2
[3] Yujun Shen, Ceyuan Yang, Xiaoou Tang, and Bolei Zhou. Interfacegan: Interpreting the disentangled face representation learned by gans. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. 2, 4, 6
[4] Yujun Shen, Jinjin Gu, Xiaoou Tang, and Bolei Zhou. Interpreting the latent space of gans for semantic face editing. In CVPR, pages 9243¨C9252, 2020. 2, 4, 15
[5] Yujun Shen and Bolei Zhou. Closed-form factorization of latent semantics in gans. arXiv preprint arXiv:2007.06600, 2020. 2
[6] Erik Ha ?rko ?nen, Aaron Hertzmann, Jaakko Lehtinen, and Sylvain Paris. Ganspace: Discovering interpretable gan ctrols. arXiv preprint arXiv:2004.02546, 2020. 2
[7] Andrey Voynov and Artem Babenko. Unsupervised discovery of interpretable directions in the gan latent space. In ICML, pages 9786¨C9796. PMLR, 2020. 2
[8] Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. In ICCV, pages 3730¨C3738, 2015. 3, 6, 14, 15, 16





·µ»Ø