¹þ¹þ(haha)ÌåÓý

ÉêÇëÊÔÓÃ
怬
½¹µãÊÖÒÕ
ÒÔÔ­´´ÊÖÒÕϵͳΪ»ù±¾£¬£¬£¬ £¬£¬£¬£¬SenseCoreÉÌÌÀAI´ó×°ÖÃΪ½¹µã»ù×ù£¬£¬£¬ £¬£¬£¬£¬½á¹¹¶àÁìÓò¡¢¶àÆ«ÏòÇ°ÑØÑо¿£¬£¬£¬ £¬£¬£¬£¬
¿ìËÙÂòͨAIÔÚ¸÷¸ö±ÊÖ±³¡¾°ÖеÄÓ¦Ó㬣¬£¬ £¬£¬£¬£¬ÏòÐÐÒµ¸³ÄÜ¡£¡£¡£¡£¡£¡£

CVPR 2021 | ¡°ÒÔÒô¸ÐÈË¡±£º×Ë̬¿É¿ØµÄÓïÒôÇý¶¯ÓïÑÔÈËÁ³

2021-08-10

±¾ÎIJ»Ê¹ÓÃÈκÎÈËΪ½ç˵µÄ½á¹¹ÐÅÏ¢£¨ÈËÁ³Òªº¦µã»òÕß3DÈËÁ³Ä£×Ó£©£¬£¬£¬ £¬£¬£¬£¬ÀÖ³ÉʵÏÖÁËÈËÍ·×Ë̬¿É¿ØµÄÓïÒôÇý¶¯í§ÒâÓïÑÔÈËÁ³ÌìÉú¡£¡£¡£¡£¡£¡£±¾ÎĵÄÒªº¦ÔÚÓÚ£¬£¬£¬ £¬£¬£¬£¬ÒþʽµØÔÚDZ¿Õ¼ä£¨latent space£©Öнç˵ÁËÒ»¸ö12άµÄ×Ë̬±àÂ룬£¬£¬ £¬£¬£¬£¬ÓÃÓÚÍ·²¿Ô˶¯¿ØÖÆ¡£¡£¡£¡£¡£¡£


±¾ÎÄÏà±ÈÓÚ֮ǰµÄÒªÁ죬£¬£¬ £¬£¬£¬£¬×èÖ¹ÁËÒªº¦µã»òÕß3DÄ£×ÓÅÌËã½û¾øÈ·´øÀ´µÄ°ÃÄÕ£¬£¬£¬ £¬£¬£¬£¬ÓÖ¼á³ÖÁË×ÔÓɶȺͳ°ôÐÔ¡£¡£¡£¡£¡£¡£ÊµÏÖÁËÔÚÓïÒô¿ØÖÆ×¼È·×ìÐ͵Äͬʱ£¬£¬£¬ £¬£¬£¬£¬ÓÃÁíÒ»¶ÎÊÓÆµ¿ØÖÆÍ·²¿Ô˶¯¡£¡£¡£¡£¡£¡£ÔÚÕâÒ»¿ò¼ÜÏ£¬£¬£¬ £¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔÈÃÈκÎÈË˵³öÂíÏÈÉú¾­µäµÄ¡°²»½²ÎäµÂ¡±½²»°£¬£¬£¬ £¬£¬£¬£¬²Êµ°ÔÚÎÒÃÇdemo videoµÄ×îºó£¡


±¾ÎÄÓÉÏã¸ÛÖÐÎÄ-ÉÌÌÀÍŽáʵÑéÊÒ£¬£¬£¬ £¬£¬£¬£¬ÉÌÌÀ¿Æ¼¼ºÍÄÏÑóÀí¹¤´óѧS-LabµÈÏàÖúÍê³É¡£¡£¡£¡£¡£¡£

ͼƬ 1.png

ÌìÉúͼÏñµÄ×ìÐÍÓÉÒôƵ¿ØÖÆ£¬£¬£¬ £¬£¬£¬£¬ÓëÒôƵԴÊÓÆµÍ¬²½£»£»£»£»£»£»£»£»

ÌìÉúͼÏñÍ·²¿Ô˶¯ÓÉ×Ë̬Դ¿ØÖÆ£¬£¬£¬ £¬£¬£¬£¬ÓëÏ·½ÊÓÆµÍ¬²½¡£¡£¡£¡£¡£¡£

Part 1 ʹÃüÅä¾°

ÓïÒôÇý¶¯µÄÓïÑÔÈËÁ³ÌìÉú£¨Talking face, Talking head generation£©ÕâÒ»¿ÎÌâ×Ô¼ºÓжàÖÖ²î±ðµÄʵÑéÉèÖᣡ£¡£¡£¡£¡£´ËÆ«ÏòµÄ×ÛÊö¿ÉÒԲο¼ Lele Chen µÄ What comprises a good talking-head video generation? : A Survey and Benchmark [1]¡£¡£¡£¡£¡£¡£ÔÚÕâÀï±¾ÎÄFocusµÄÆ«ÏòΪ»ùÓÚµ¥ÕÅͼÏñ£¨One-shot£©£¬£¬£¬ £¬£¬£¬£¬ÃæÏòí§ÒâÈËÁ³£¬£¬£¬ £¬£¬£¬£¬ÓïÒôÇý¶¯settingϵÄÓïÑÔÈËÁ³ÌìÉúÎÊÌâ¡£¡£¡£¡£¡£¡£ÏêϸÀ´Ëµ£¬£¬£¬ £¬£¬£¬£¬ÎÒÃÇÏ£Íû»ùÓÚÒ»ÕÅͼƬ£¬£¬£¬ £¬£¬£¬£¬ÌìÉúÓëÓïÒôͬ²½µÄÓïÑÔÈËÁ³ÊÓÆµ¡£¡£¡£¡£¡£¡£


ÕâÒ»settingϵÄÊÂÇé°üÀ¨ VGG ×éµÄYou said that? [2] £¬£¬£¬ £¬£¬£¬£¬CUHK£¨±ÊÕß×Ô¼º£©µÄDAVS [3]£¬£¬£¬ £¬£¬£¬£¬ÀÖÀÖµÄATVG [4] ÒÔ¼°AdobeÖÜÑóºÍÀî¶¡²©Ê¿µÄMakeitTalk [5]µÈµÈ¡£¡£¡£¡£¡£¡£ÕûÌåÀ´½²£¬£¬£¬ £¬£¬£¬£¬Ö®Ç°µÄÊÂÇé[2][3][4]¸ü¶àµÄ¹Ø×¢ÓÚ×ìÐ͵Ä׼ȷÐÔºÍIDµÄÉúÑÄÉÏ£¬£¬£¬ £¬£¬£¬£¬´Ó¶øºöÂÔÁËÍ·²¿µÄ×ÔÈ»Ô˶¯¡£¡£¡£¡£¡£¡£ÔÚ±¾ÎÄÖÐÎÒÃÇËùÊÔͼ½â¾öµÄ£¬£¬£¬ £¬£¬£¬£¬ÊÇ֮ǰÓïÑÔÈËÁ³ÌìÉúÖÐÈËÍ·poseÄÑÒÔ¿ØÖÆÕâÒ»ÎÊÌâ¡£¡£¡£¡£¡£¡£


ͼƬ 1.png

ATVG PaperÖеıÈÕÕͼ


×î½üµÄMakeittalk[5]ºÍÀÖÀÖµÄRhythmic Head[6] Ôò¹Ø×¢ÓÚºÍСÎÒ˽¼ÒIDÐÅÏ¢ÓйصÄ×ÔȻͷ²¿Ô˶¯¡£¡£¡£¡£¡£¡£¿ÉÊÇËûÃǵÄÒªÁì¶¼ÒÀÀµÓÚ3DµÄ½á¹¹»¯ÐÅÏ¢¡£¡£¡£¡£¡£¡£


Ïë×ÔÁ¦¿ØÖÆÍ·²¿Ô˶¯£¬£¬£¬ £¬£¬£¬£¬¾ÍÐèÒª¶ÔHead poseºÍfacial expression£¬£¬£¬ £¬£¬£¬£¬identity×öÒ»¸ö½âñî¡£¡£¡£¡£¡£¡£Í¨¹ý˼Ë÷ÎÒÃÇ¿ÉÒÔÒâʶµ½£¬£¬£¬ £¬£¬£¬£¬ÕâÖÖ½âñîÔÚ2DͼÏñºÍ2D landmarkµÄ±íÕ÷Öж¼ºÜÄÑʵÏÖ¡£¡£¡£¡£¡£¡£¶øÔÚÎÒÃÇÓïÒôÇý¶¯µÄ´óÌõ¼þÏ£¬£¬£¬ £¬£¬£¬£¬×ìÐÍÒªºÍaudio¶ÔÆë£¬£¬£¬ £¬£¬£¬£¬Í·²¿Ô˶¯ÓÖÒª×ÔÈ»£¬£¬£¬ £¬£¬£¬£¬¿ÉÒÔ˵ÊÇÄÑÉϼÓÄÑ¡£¡£¡£¡£¡£¡£ÁíÒ»·½Ã棬£¬£¬ £¬£¬£¬£¬3DµÄÈËÁ³±íÕ÷ÖУ¬£¬£¬ £¬£¬£¬£¬head poseºÍfacial expression¿ÉÒÔ×ÔÈ»µØÓòî±ðµÄ²ÎÊý¿ØÖÆ£¬£¬£¬ £¬£¬£¬£¬¿ÉÒÔ˵ÊÇ×î¼ÑÑ¡Ôñ¡£¡£¡£¡£¡£¡£


Òò´Ë֮ǰµÄÊÂÇ飬£¬£¬ £¬£¬£¬£¬Makeittalk[5]Ñ¡ÔñÁË3DµÄÈËÁ³Òªº¦µã£¬£¬£¬ £¬£¬£¬£¬¶øRhythmic Head[6]ÔòÖ±½ÓÒÀÀµÓÚÍêÕûµØ3DÖØÐÞ¡£¡£¡£¡£¡£¡£¿ÉÊÇ»ùÓÚ3DµÄÈËÁ³½¨Ä££¬£¬£¬ £¬£¬£¬£¬ÓÈÆäÊÇÔÚ¼«¶Ë³¡¾°Ï£¬£¬£¬ £¬£¬£¬£¬¿ªÔ´ÒªÁìµÄ׼ȷ¶È²¢ÎÞ·¨°ü¹Ü¡£¡£¡£¡£¡£¡£¶ø»ùÓÚÓÅ»¯Ëã·¨µÄ3D fitting»¹»á´øÀ´´ó×ÚµÄÔ¤´¦Öóͷ£¼ç¸º¡£¡£¡£¡£¡£¡£ÒÔÊDZ¾ÎIJ»Ê¹ÓÃ3D»ò½á¹¹»¯Êý¾Ý£¬£¬£¬ £¬£¬£¬£¬ÖØÐ´Ó2DÈëÊÖ½â¾öÎÊÌâ¡£¡£¡£¡£¡£¡£


Part 2 ÒªÁìÏÈÈÝ

¹þ¹þ(haha)ÌåÓýÒªÁìPose-Controllable Audio-Visual System (PC-AVS)Ö±½ÓÔÚÌØÕ÷ѧϰºÍͼÏñÖØÐ޵Ŀò¼ÜÏ£¬£¬£¬ £¬£¬£¬£¬ÊµÏÖÁ˶ÔÈËÍ·poseµÄ×ÔÓÉ¿ØÖÆ¡£¡£¡£¡£¡£¡£¹þ¹þ(haha)ÌåÓý½¹µãÔÚÓÚÒþʽµØÔÚDZ¿Õ¼ä£¨latent space£©Öнç˵ÁËÒ»¸ö12άµÄ×Ë̬±àÂ룬£¬£¬ £¬£¬£¬£¬¶øÕâÒ»Éè¼ÆÔ´ÓÚ¶ÔÈ¥ÄêCVPRʹÓÃstyleGANʵÏÖFace Reeanctment[7]µÄÊÂÇ飨ÈçÏÂͼ£©µÄ²Î¿¼¡£¡£¡£¡£¡£¡£


ͼƬ 1.png


µ«ËûÃÇÊÂÇéÖÐֻ˵Ã÷ÎústyleGAN¿ÉÒÔʹÓÃaugmented frame¾ÙÐÐͼÏñµ½Í¼ÏñµÄ¿ØÖÆ¡£¡£¡£¡£¡£¡£¶øÔÚÓïÒôÇý¶¯µÄÓïÑÔÈËÁ³ÎÊÌâÖУ¬£¬£¬ £¬£¬£¬£¬conditionÏÖʵÀ´×ÔaudioµÄ³¡¾°Ï£¬£¬£¬ £¬£¬£¬£¬Ö±½Ó±©Á¦½èÓÃÕâÒ»¿ò¼Ü½«ÄÑÒÔ¾ÙÐÐѵÁ·£¬£¬£¬ £¬£¬£¬£¬ÓÉÓÚÓïÒô²¢²»¿ÉÌṩÈËÁ³×Ë̬ÐÅÏ¢¡£¡£¡£¡£¡£¡£


»ùÓÚ¶ÔÓïÑÔÈËÁ³µÄÊӲ죬£¬£¬ £¬£¬£¬£¬ÎÒÃÇÔÚÎÄÖаÑaugmentedͼÏñµÄDZ¿Õ¼ä£¬£¬£¬ £¬£¬£¬£¬½ç˵ΪÎÞID¿Õ¼ä£¨Non-Identity Space£©¡£¡£¡£¡£¡£¡£Ö±¹ÛÉϽ²£¬£¬£¬ £¬£¬£¬£¬Ôڴ˿ռäÖУ¬£¬£¬ £¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔÖØÐÂѰÕÒ×ìÐÍÓëÓïÒô¹ØÁªµÄÓïÑÔÄÚÈݿռ䣨Speech Contant Space£©£¬£¬£¬ £¬£¬£¬£¬ºÍÌåÏÖÍ·²¿Ô˶¯µÄ×Ë̬¿Õ¼ä£¨Pose Space£©¡£¡£¡£¡£¡£¡£


ͼƬ 1.png


ÎÒÃÇÊÂÇéµÄÍêÕûpipelineÈçÏÂͼËùʾ£¬£¬£¬ £¬£¬£¬£¬ÑµÁ·Êý¾ÝʹÓõÄÊÇ´ó×ڵĺ¬ÓïÒôÊÓÆµ¡£¡£¡£¡£¡£¡£ÎÒÃÇʹÓÃí§ÒâµÄÒ»Ö¡image.png×÷ΪID²Î¿¼ÊäÈ룬£¬£¬ £¬£¬£¬£¬±äÐÎÁíÒ»Ö¡image.pngΪimage.png£¬£¬£¬ £¬£¬£¬£¬²¢½«Óëimage.png¶ÔÆëµÄÓïÒôµÄƵÆ×image.png×÷Ϊcondition£¬£¬£¬ £¬£¬£¬£¬ÊÔͼʹÓÃÍøÂç»Ö¸´image.png¡£¡£¡£¡£¡£¡£


ͼƬ 1.png


ʹÓÃÊý¾Ý¼¯µÄIDÔ¼Êø£¬£¬£¬ £¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔͨ¹ýID encoder image.png »ñµÃIdentity Space£»£»£»£»£»£»£»£»½èÖú֮ǰµÄaugmentation£¬£¬£¬ £¬£¬£¬£¬ÎÒÃÇͨ¹ýencder image.png,»ñµÃNon-Identity Space¡£¡£¡£¡£¡£¡£½ÓÏÂÀ´µÄÎÊÌâÊÇÔõÑùʩչaudioµÄ×÷Ó㬣¬£¬ £¬£¬£¬£¬ÒÔ¼°ÔõÑùÈÃͼÏñÖ»Ô¼ÊøPose¶ø²»¿ØÖÆ×ìÐÍ¡£¡£¡£¡£¡£¡£


Learning Speech Content Space. ÎÒÃÇÏ£ÍûNon-Identity SpaceµÄfeature¾­ÓÉÒ»¸ömapping image.png Ó³ÉäÖÁspeech content spaceÖС£¡£¡£¡£¡£¡£¶øÕâÒ»latent spaceµÄѧϰ£¬£¬£¬ £¬£¬£¬£¬Ö÷ÒªÒÀÀµÒôƵºÍÊÓÆµÖ®¼ä×ÔÈ»µÄ¶ÔÆë¡¢Í¬²½ÐÅÏ¢£¨alignment£©¡£¡£¡£¡£¡£¡£ÔÚ֮ǰµÄÊÂÇéÖÐÕâÒѾ­±»Ö¤ÊµÊÇaudio-visualÁìÓòÓô¦×îÆÕ±éµÄ×Ô¼àÊÓÖ®Ò»[8]¡£¡£¡£¡£¡£¡£ÔÚÕâÀïÎÒÃÇʹÓÃÓïÒôÓëÈËÁ³ÐòÁÐÖ®¼äµÄ¶ÔÆë¹¹½¨contrastive loss¾ÙÐÐ¶ÔÆëµÄÔ¼Êø£»£»£»£»£»£»£»£»¶ÔÆëµÄÈËÁ³ÐòÁкÍÓïÒôÌØÕ÷ image.pngÊÇÕýÑù±¾£¬£¬£¬ £¬£¬£¬£¬·Ç¶ÔÆëµÄ image.png Îª¸ºÑù±¾¡£¡£¡£¡£¡£¡£½ç˵Á½¸öfeatureÖ®¼äµÄcos¾àÀëΪ image.png £¬£¬£¬ £¬£¬£¬£¬ÕâÒ»Ô¼Êø¿ÉÒÔ±í´ïΪ£º


image.png


Devising Pose Code. ÁíÒ»·½Ã棬£¬£¬ £¬£¬£¬£¬ÎÒÃǽèÖú3D±íÕ÷ÖеÄpiror knowledge¡£¡£¡£¡£¡£¡£Ò»¸ö12ά¶ÈµÄÏòÁ¿×ÅʵÒѾ­×ãÒÔ±í´ïÈËÍ·µÄ×Ë̬£¬£¬£¬ £¬£¬£¬£¬°üÀ¨Ò»¸ö9άµÄÐýת¾ØÕ󣬣¬£¬ £¬£¬£¬£¬2άµÄÆ½ÒƺÍ1άµÄ±ê×¼¡£¡£¡£¡£¡£¡£ÒÔÊÇÎÒÃÇʹÓÃÒ»¸öÌØÁíÍâmapping£¬£¬£¬ £¬£¬£¬£¬´ÓNon-Identity SpaceÖÐÓ³ÉäÒ»¸ö12άµÄPose Code¡£¡£¡£¡£¡£¡£Õâ¸öά¶ÈÉϵÄÉè¼ÆºÜÊÇÖ÷Òª£¬£¬£¬ £¬£¬£¬£¬ÔõÑùά¶È¹ý´ó£¬£¬£¬ £¬£¬£¬£¬ÕâÒ»latent codeËù±í´ïµÄ¾Í¿ÉÄÜÁè¼ÝposeÐÅÏ¢£¬£¬£¬ £¬£¬£¬£¬µ¼ÖÂ×ìÐÍÊÕµ½Ó°Ïì¡£¡£¡£¡£¡£¡£


×îºóÎÒÃÇ°Ñ Identity Space£¬£¬£¬ £¬£¬£¬£¬Speech Content Space ºÍ Pose code ÍŽáÆðÀ´£¬£¬£¬ £¬£¬£¬£¬ËÍÈë»ùÓÚStyleGAN2[9]ˢеÄGenerator¡£¡£¡£¡£¡£¡£ÕâÈýÕßµÄÐÅÏ¢ÔÚGeneratorÖÐͨ¹ýͼÏñÖØÐÞѵÁ·¾ÙÐÐÆ½ºâ£¬£¬£¬ £¬£¬£¬£¬lossÐÎʽʹÓÃÁËpix2pixHDµÄÖØÐÞѵÁ·loss¡£¡£¡£¡£¡£¡£ÔÚѵÁ·ÖУ¬£¬£¬ £¬£¬£¬£¬pose codeÆð×÷ÓõÄÔ­ÀíÊÇ£¬£¬£¬ £¬£¬£¬£¬ÔÚIDºÍposeÐÅÏ¢¶¼ÏÔʽµØ±»Ô¼ÊøµÄÌõ¼þÏ£¬£¬£¬ £¬£¬£¬£¬Pose Code×îÈÝÒ×ѧµ½µÄÐÅÏ¢ÊǸıäÈËÍ·µÄ×Ë̬£¬£¬£¬ £¬£¬£¬£¬ÒÔïÔÌ­ÖØÐÞµÄloss¡£¡£¡£¡£¡£¡£ÔÚÕâһĿµÄÏ£¬£¬£¬ £¬£¬£¬£¬ÓÉÓÚ×Ë̬Öð½¥Óë¹þ¹þ(haha)ÌåÓýÄ¿µÄÌùºÏ£¬£¬£¬ £¬£¬£¬£¬×ìÐ͵ÄÖØÐÞÔ¼ÊøÒ²»á·´¹ýÀ´×ÊÖúaudio featureµÄѧϰ£¬£¬£¬ £¬£¬£¬£¬´Ó¶øµÖ´ïƽºâ¡£¡£¡£¡£¡£¡£


Part 3 ÊµÑéЧ¹û

ÎÒÃÇÔÚÊýÖµÉϺÍÖÊÁ¿ÉÏÓë֮ǰSOTAµÄí§ÒâÓïÒôÇý¸ÐÈËÁ³µÄÒªÁì¾ÙÐÐÁ˱ÈÕÕ¡£¡£¡£¡£¡£¡£ÔÚÊýÖµÉÏ£¬£¬£¬ £¬£¬£¬£¬ÎÒÃDZÈÕÕÁËLRWºÍVoxCeleb2Á½¸öÊý¾Ý¼¯£¬£¬£¬ £¬£¬£¬£¬ÖØµã¹Ø×¢ÓÚÌìÉúͼÏñ»¹Ô­¶È£¨SSIM£©£¬£¬£¬ £¬£¬£¬£¬Í¼ÏñÇåÎú¶È£¨CPDB£©£¬£¬£¬ £¬£¬£¬£¬ÌìÉú×ìÐÍlandmarkµÄ׼ȷ¶È£¨LMD£©ºÍÌìÉú×ìÐÍÓëÒôƵµÄͬ²½ÐÔ£¬£¬£¬ £¬£¬£¬£¬Ê¹ÓÃSyncNet[8]µÄconfidence scoreÆÀ¼Û£¨ image.png £©¡£¡£¡£¡£¡£¡£


ͼƬ 1.png


ÎÒÃÇÓë֮ǰҪÁìµÄ±ÈÕÕͼÈçÏÂËùʾ£º


ͼƬ 1.png


¸ü¶àµÄAblationºÍЧ¹û¿ÉÒԲο¼¹þ¹þ(haha)ÌåÓýpaperºÍdemo video£¬£¬£¬ £¬£¬£¬£¬Õâ±ßչʾÁËÔÚ¼«¶ËÇéÐΣ¨´ó½Ç¶È£¬£¬£¬ £¬£¬£¬£¬µÍÇø·ÖÂÊ£©µÄÌìÉúЧ¹û¡£¡£¡£¡£¡£¡£Õ¹Ê¾ÁËÈôÊÇÎÒÃǰÑpose codeÖÃ0£¬£¬£¬ £¬£¬£¬£¬¿ÉÒÔʵÏÖתÕýµÄÓïÑÔÈËÁ³Ð§¹û¡£¡£¡£¡£¡£¡£


ͼƬ 1.png


Part 4 ×ܽá

ÔÚÕâ¸öÊÂÇéÖУ¬£¬£¬ £¬£¬£¬£¬ÎÒÃÇÌá³öÁËPose-Controllable Audio-Visual System (PC-AVS)£¬£¬£¬ £¬£¬£¬£¬ÀÖ³ÉÔÚÓïÒôí§ÒâÓïÑÔÈ˵ÄsettingÏ£¬£¬£¬ £¬£¬£¬£¬ÌìÉúÁË×Ë̬¿É¿ØµÄЧ¹û¡£¡£¡£¡£¡£¡£×ÛºÏÀ´¿´¹þ¹þ(haha)ÌåÓýÒªÁìÓÐÒÔϼ¸¸öÌØÖÊÖµµÃ¹Ø×¢£º


  1. ¹þ¹þ(haha)ÌåÓýÒªÁì²»½èÖúÔ¤½ç˵µÄ½á¹¹ÐÅÏ¢£¬£¬£¬ £¬£¬£¬£¬½öʹÓÃÒ»¸öͼÏñÖØÐÞµÄpipeline£¬£¬£¬ £¬£¬£¬£¬Àֳɽç˵ÁËÒ»¸ö¶ÔÈËÁ³poseµÄ±íÕ÷¡£¡£¡£¡£¡£¡£

  2. ÓÉstyle-based generatorƽºâµÄѵÁ·Ä£Ê½Èô½ÐÎÌìÉúÊÕµ½¸üÆõºÏµÄÖØÐÞÔ¼Êø£¬£¬£¬ £¬£¬£¬£¬´Ó¶øÌáÉýÁË´½ÐÎ¶ÔÆëµÄ׼ȷ¶È¡£¡£¡£¡£¡£¡£

  3. ÎÒÃÇʵÏÖÁËí§ÒâÓïÑÔÈËÁ³ÏµÄ×ÔÓÉÈËÍ·×Ë̬¿ØÖÆ£¬£¬£¬ £¬£¬£¬£¬Ê¹ÌìÉúµÄЧ¹ûÔ½·¢ÕæÊµ¡£¡£¡£¡£¡£¡£

  4. ¹þ¹þ(haha)ÌåÓýÄ£×ÓÔÚ¼«¶ËÇéÐÎÏÂÓкܺõij°ôÐÔ£¬£¬£¬ £¬£¬£¬£¬²¢ÇÒʵÏÖÁËתÕýµÄÓïÑÔÈËÁ³ÌìÉú¡£¡£¡£¡£¡£¡£


Ïà¹ØÁ´½Ó

Paper µØµã£ºhttps://arxiv.org/abs/2104.11116

Github£ºhttps://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS

Project Page£ºhttps://hangz-nju-cuhk.github.io/projects/PC-AVS


References

1.#What comprises a good talking-head video generation?: A Survey and Benchmark https://arxiv.org/abs/2005.03201

2. #Joon Son Chung, Amir Jamaludin, and Andrew Zisserman. You said that? In BMVC, 2017. https://arxiv.org/abs/1705.02966

3. #Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, and Xiaogang Wang. Talking face generation by adversarially disentangled audio-visual representation. In Proceedings of the AAAI ConConference on Artificial Intelligence (AAAI), 2019. https://arxiv.org/abs/1807.07860

4. #Lele Chen, Ross K Maddox, Zhiyao Duan, and Chenliang Xu. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. https://www.cs.rochester.edu/u/lchen63/cvpr2019.pdf

5. #Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria, Evangelos Kalogerakis, and Dingzeyu Li. Makeittalk: Speaker-aware talking head animation. SIGGRAPH ASIA, 2020. https://arxiv.org/abs/2004.12992

6. #Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, and Chenliang Xu. Talking-head generation with rhythmic head motion. European Conference on Computer Vision (ECCV), 2020. https://www.cs.rochester.edu/u/lchen63/eccv2020-arxiv.pdf

7. #Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor Lem-pitsky. Neural head reenactment with latent pose descriptors. In Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR), 2020.  https://openaccess.thecvf.com/content_CVPR_2020/papers/Burkov_Neural_Head_Reenactment_with_Latent_Pose_Descriptors_CVPR_2020_paper.pdf

8. #Joon Son Chung and Andrew Zisserman. Out of time: auto-mated lip sync in the wild. In ACCV Workshop, 2016. https://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf

9.#Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. InProceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2020. https://openaccess.thecvf.com/content_CVPR_2020/papers/Karras_Analyzing_and_Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.pdf

²úÆ·ÊÔÓÃ
Ìîд´Ë¼òÆÓ±í¸ñ£¬£¬£¬ £¬£¬£¬£¬ÎÒÃǽ«¾¡¿ìÁªÏµÄú£¡
ÉÌÎñÏàÖú
400 900 5986
ÖÜÒ»ÖÁÖÜÎå 9:00-12:00£¬£¬£¬ £¬£¬£¬£¬13:00-18:00
ÏàÖúͬ°éÕÐļ
¡¾ÍøÕ¾µØÍ¼¡¿¡¾sitemap¡¿