- ½¹µãÊÖÒÕ
- ÒÔÔ´´ÊÖÒÕϵͳΪ»ù±¾£¬£¬£¬£¬£¬£¬£¬SenseCoreÉÌÌÀAI´ó×°ÖÃΪ½¹µã»ù×ù£¬£¬£¬£¬£¬£¬£¬½á¹¹¶àÁìÓò¡¢¶àÆ«ÏòÇ°ÑØÑо¿£¬£¬£¬£¬£¬£¬£¬
¿ìËÙÂòͨAIÔÚ¸÷¸ö±ÊÖ±³¡¾°ÖеÄÓ¦Ó㬣¬£¬£¬£¬£¬£¬ÏòÐÐÒµ¸³ÄÜ¡£¡£¡£¡£¡£¡£
CVPR 2021 | ¡°ÒÔÒô¸ÐÈË¡±£º×Ë̬¿É¿ØµÄÓïÒôÇý¶¯ÓïÑÔÈËÁ³
±¾ÎIJ»Ê¹ÓÃÈκÎÈËΪ½ç˵µÄ½á¹¹ÐÅÏ¢£¨ÈËÁ³Òªº¦µã»òÕß3DÈËÁ³Ä£×Ó£©£¬£¬£¬£¬£¬£¬£¬ÀÖ³ÉʵÏÖÁËÈËÍ·×Ë̬¿É¿ØµÄÓïÒôÇý¶¯í§ÒâÓïÑÔÈËÁ³ÌìÉú¡£¡£¡£¡£¡£¡£±¾ÎĵÄÒªº¦ÔÚÓÚ£¬£¬£¬£¬£¬£¬£¬ÒþʽµØÔÚDZ¿Õ¼ä£¨latent space£©Öнç˵ÁËÒ»¸ö12άµÄ×Ë̬±àÂ룬£¬£¬£¬£¬£¬£¬ÓÃÓÚÍ·²¿Ô˶¯¿ØÖÆ¡£¡£¡£¡£¡£¡£
±¾ÎÄÏà±ÈÓÚ֮ǰµÄÒªÁ죬£¬£¬£¬£¬£¬£¬×èÖ¹ÁËÒªº¦µã»òÕß3DÄ£×ÓÅÌËã½û¾øÈ·´øÀ´µÄ°ÃÄÕ£¬£¬£¬£¬£¬£¬£¬ÓÖ¼á³ÖÁË×ÔÓɶȺͳ°ôÐÔ¡£¡£¡£¡£¡£¡£ÊµÏÖÁËÔÚÓïÒô¿ØÖÆ×¼È·×ìÐ͵Äͬʱ£¬£¬£¬£¬£¬£¬£¬ÓÃÁíÒ»¶ÎÊÓÆµ¿ØÖÆÍ·²¿Ô˶¯¡£¡£¡£¡£¡£¡£ÔÚÕâÒ»¿ò¼ÜÏ£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔÈÃÈκÎÈË˵³öÂíÏÈÉú¾µäµÄ¡°²»½²ÎäµÂ¡±½²»°£¬£¬£¬£¬£¬£¬£¬²Êµ°ÔÚÎÒÃÇdemo videoµÄ×îºó£¡
±¾ÎÄÓÉÏã¸ÛÖÐÎÄ-ÉÌÌÀÍŽáʵÑéÊÒ£¬£¬£¬£¬£¬£¬£¬ÉÌÌÀ¿Æ¼¼ºÍÄÏÑóÀí¹¤´óѧS-LabµÈÏàÖúÍê³É¡£¡£¡£¡£¡£¡£

ÌìÉúͼÏñµÄ×ìÐÍÓÉÒôƵ¿ØÖÆ£¬£¬£¬£¬£¬£¬£¬ÓëÒôƵԴÊÓÆµÍ¬²½£»£»£»£»£»£»£»£»
ÌìÉúͼÏñÍ·²¿Ô˶¯ÓÉ×Ë̬Դ¿ØÖÆ£¬£¬£¬£¬£¬£¬£¬ÓëÏ·½ÊÓÆµÍ¬²½¡£¡£¡£¡£¡£¡£
Part 1 ʹÃüÅä¾°
ÓïÒôÇý¶¯µÄÓïÑÔÈËÁ³ÌìÉú£¨Talking face, Talking head generation£©ÕâÒ»¿ÎÌâ×Ô¼ºÓжàÖÖ²î±ðµÄʵÑéÉèÖᣡ£¡£¡£¡£¡£´ËÆ«ÏòµÄ×ÛÊö¿ÉÒԲο¼ Lele Chen µÄ What comprises a good talking-head video generation? : A Survey and Benchmark [1]¡£¡£¡£¡£¡£¡£ÔÚÕâÀï±¾ÎÄFocusµÄÆ«ÏòΪ»ùÓÚµ¥ÕÅͼÏñ£¨One-shot£©£¬£¬£¬£¬£¬£¬£¬ÃæÏòí§ÒâÈËÁ³£¬£¬£¬£¬£¬£¬£¬ÓïÒôÇý¶¯settingϵÄÓïÑÔÈËÁ³ÌìÉúÎÊÌâ¡£¡£¡£¡£¡£¡£ÏêϸÀ´Ëµ£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇÏ£Íû»ùÓÚÒ»ÕÅͼƬ£¬£¬£¬£¬£¬£¬£¬ÌìÉúÓëÓïÒôͬ²½µÄÓïÑÔÈËÁ³ÊÓÆµ¡£¡£¡£¡£¡£¡£
ÕâÒ»settingϵÄÊÂÇé°üÀ¨ VGG ×éµÄYou said that? [2] £¬£¬£¬£¬£¬£¬£¬CUHK£¨±ÊÕß×Ô¼º£©µÄDAVS [3]£¬£¬£¬£¬£¬£¬£¬ÀÖÀÖµÄATVG [4] ÒÔ¼°AdobeÖÜÑóºÍÀî¶¡²©Ê¿µÄMakeitTalk [5]µÈµÈ¡£¡£¡£¡£¡£¡£ÕûÌåÀ´½²£¬£¬£¬£¬£¬£¬£¬Ö®Ç°µÄÊÂÇé[2][3][4]¸ü¶àµÄ¹Ø×¢ÓÚ×ìÐ͵Ä׼ȷÐÔºÍIDµÄÉúÑÄÉÏ£¬£¬£¬£¬£¬£¬£¬´Ó¶øºöÂÔÁËÍ·²¿µÄ×ÔÈ»Ô˶¯¡£¡£¡£¡£¡£¡£ÔÚ±¾ÎÄÖÐÎÒÃÇËùÊÔͼ½â¾öµÄ£¬£¬£¬£¬£¬£¬£¬ÊÇ֮ǰÓïÑÔÈËÁ³ÌìÉúÖÐÈËÍ·poseÄÑÒÔ¿ØÖÆÕâÒ»ÎÊÌâ¡£¡£¡£¡£¡£¡£

ATVG PaperÖеıÈÕÕͼ
×î½üµÄMakeittalk[5]ºÍÀÖÀÖµÄRhythmic Head[6] Ôò¹Ø×¢ÓÚºÍСÎÒ˽¼ÒIDÐÅÏ¢ÓйصÄ×ÔȻͷ²¿Ô˶¯¡£¡£¡£¡£¡£¡£¿ÉÊÇËûÃǵÄÒªÁì¶¼ÒÀÀµÓÚ3DµÄ½á¹¹»¯ÐÅÏ¢¡£¡£¡£¡£¡£¡£
Ïë×ÔÁ¦¿ØÖÆÍ·²¿Ô˶¯£¬£¬£¬£¬£¬£¬£¬¾ÍÐèÒª¶ÔHead poseºÍfacial expression£¬£¬£¬£¬£¬£¬£¬identity×öÒ»¸ö½âñî¡£¡£¡£¡£¡£¡£Í¨¹ý˼Ë÷ÎÒÃÇ¿ÉÒÔÒâʶµ½£¬£¬£¬£¬£¬£¬£¬ÕâÖÖ½âñîÔÚ2DͼÏñºÍ2D landmarkµÄ±íÕ÷Öж¼ºÜÄÑʵÏÖ¡£¡£¡£¡£¡£¡£¶øÔÚÎÒÃÇÓïÒôÇý¶¯µÄ´óÌõ¼þÏ£¬£¬£¬£¬£¬£¬£¬×ìÐÍÒªºÍaudio¶ÔÆë£¬£¬£¬£¬£¬£¬£¬Í·²¿Ô˶¯ÓÖÒª×ÔÈ»£¬£¬£¬£¬£¬£¬£¬¿ÉÒÔ˵ÊÇÄÑÉϼÓÄÑ¡£¡£¡£¡£¡£¡£ÁíÒ»·½Ã棬£¬£¬£¬£¬£¬£¬3DµÄÈËÁ³±íÕ÷ÖУ¬£¬£¬£¬£¬£¬£¬head poseºÍfacial expression¿ÉÒÔ×ÔÈ»µØÓòî±ðµÄ²ÎÊý¿ØÖÆ£¬£¬£¬£¬£¬£¬£¬¿ÉÒÔ˵ÊÇ×î¼ÑÑ¡Ôñ¡£¡£¡£¡£¡£¡£
Òò´Ë֮ǰµÄÊÂÇ飬£¬£¬£¬£¬£¬£¬Makeittalk[5]Ñ¡ÔñÁË3DµÄÈËÁ³Òªº¦µã£¬£¬£¬£¬£¬£¬£¬¶øRhythmic Head[6]ÔòÖ±½ÓÒÀÀµÓÚÍêÕûµØ3DÖØÐÞ¡£¡£¡£¡£¡£¡£¿ÉÊÇ»ùÓÚ3DµÄÈËÁ³½¨Ä££¬£¬£¬£¬£¬£¬£¬ÓÈÆäÊÇÔÚ¼«¶Ë³¡¾°Ï£¬£¬£¬£¬£¬£¬£¬¿ªÔ´ÒªÁìµÄ׼ȷ¶È²¢ÎÞ·¨°ü¹Ü¡£¡£¡£¡£¡£¡£¶ø»ùÓÚÓÅ»¯Ëã·¨µÄ3D fitting»¹»á´øÀ´´ó×ÚµÄÔ¤´¦Öóͷ£¼ç¸º¡£¡£¡£¡£¡£¡£ÒÔÊDZ¾ÎIJ»Ê¹ÓÃ3D»ò½á¹¹»¯Êý¾Ý£¬£¬£¬£¬£¬£¬£¬ÖØÐ´Ó2DÈëÊÖ½â¾öÎÊÌâ¡£¡£¡£¡£¡£¡£
Part 2 ÒªÁìÏÈÈÝ
¹þ¹þ(haha)ÌåÓýÒªÁìPose-Controllable Audio-Visual System (PC-AVS)Ö±½ÓÔÚÌØÕ÷ѧϰºÍͼÏñÖØÐ޵Ŀò¼ÜÏ£¬£¬£¬£¬£¬£¬£¬ÊµÏÖÁ˶ÔÈËÍ·poseµÄ×ÔÓÉ¿ØÖÆ¡£¡£¡£¡£¡£¡£¹þ¹þ(haha)ÌåÓý½¹µãÔÚÓÚÒþʽµØÔÚDZ¿Õ¼ä£¨latent space£©Öнç˵ÁËÒ»¸ö12άµÄ×Ë̬±àÂ룬£¬£¬£¬£¬£¬£¬¶øÕâÒ»Éè¼ÆÔ´ÓÚ¶ÔÈ¥ÄêCVPRʹÓÃstyleGANʵÏÖFace Reeanctment[7]µÄÊÂÇ飨ÈçÏÂͼ£©µÄ²Î¿¼¡£¡£¡£¡£¡£¡£

µ«ËûÃÇÊÂÇéÖÐֻ˵Ã÷ÎústyleGAN¿ÉÒÔʹÓÃaugmented frame¾ÙÐÐͼÏñµ½Í¼ÏñµÄ¿ØÖÆ¡£¡£¡£¡£¡£¡£¶øÔÚÓïÒôÇý¶¯µÄÓïÑÔÈËÁ³ÎÊÌâÖУ¬£¬£¬£¬£¬£¬£¬conditionÏÖʵÀ´×ÔaudioµÄ³¡¾°Ï£¬£¬£¬£¬£¬£¬£¬Ö±½Ó±©Á¦½èÓÃÕâÒ»¿ò¼Ü½«ÄÑÒÔ¾ÙÐÐѵÁ·£¬£¬£¬£¬£¬£¬£¬ÓÉÓÚÓïÒô²¢²»¿ÉÌṩÈËÁ³×Ë̬ÐÅÏ¢¡£¡£¡£¡£¡£¡£
»ùÓÚ¶ÔÓïÑÔÈËÁ³µÄÊӲ죬£¬£¬£¬£¬£¬£¬ÎÒÃÇÔÚÎÄÖаÑaugmentedͼÏñµÄDZ¿Õ¼ä£¬£¬£¬£¬£¬£¬£¬½ç˵ΪÎÞID¿Õ¼ä£¨Non-Identity Space£©¡£¡£¡£¡£¡£¡£Ö±¹ÛÉϽ²£¬£¬£¬£¬£¬£¬£¬Ôڴ˿ռäÖУ¬£¬£¬£¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔÖØÐÂѰÕÒ×ìÐÍÓëÓïÒô¹ØÁªµÄÓïÑÔÄÚÈݿռ䣨Speech Contant Space£©£¬£¬£¬£¬£¬£¬£¬ºÍÌåÏÖÍ·²¿Ô˶¯µÄ×Ë̬¿Õ¼ä£¨Pose Space£©¡£¡£¡£¡£¡£¡£

ÎÒÃÇÊÂÇéµÄÍêÕûpipelineÈçÏÂͼËùʾ£¬£¬£¬£¬£¬£¬£¬ÑµÁ·Êý¾ÝʹÓõÄÊÇ´ó×ڵĺ¬ÓïÒôÊÓÆµ¡£¡£¡£¡£¡£¡£ÎÒÃÇʹÓÃí§ÒâµÄÒ»Ö¡
×÷ΪID²Î¿¼ÊäÈ룬£¬£¬£¬£¬£¬£¬±äÐÎÁíÒ»Ö¡
Ϊ
£¬£¬£¬£¬£¬£¬£¬²¢½«Óë
¶ÔÆëµÄÓïÒôµÄƵÆ×
×÷Ϊcondition£¬£¬£¬£¬£¬£¬£¬ÊÔͼʹÓÃÍøÂç»Ö¸´
¡£¡£¡£¡£¡£¡£

ʹÓÃÊý¾Ý¼¯µÄIDÔ¼Êø£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔͨ¹ýID encoder
»ñµÃIdentity Space£»£»£»£»£»£»£»£»½èÖú֮ǰµÄaugmentation£¬£¬£¬£¬£¬£¬£¬ÎÒÃÇͨ¹ýencder
,»ñµÃNon-Identity Space¡£¡£¡£¡£¡£¡£½ÓÏÂÀ´µÄÎÊÌâÊÇÔõÑùʩչaudioµÄ×÷Ó㬣¬£¬£¬£¬£¬£¬ÒÔ¼°ÔõÑùÈÃͼÏñÖ»Ô¼ÊøPose¶ø²»¿ØÖÆ×ìÐÍ¡£¡£¡£¡£¡£¡£
Learning Speech Content Space. ÎÒÃÇÏ£ÍûNon-Identity SpaceµÄfeature¾ÓÉÒ»¸ömapping
Ó³ÉäÖÁspeech content spaceÖС£¡£¡£¡£¡£¡£¶øÕâÒ»latent spaceµÄѧϰ£¬£¬£¬£¬£¬£¬£¬Ö÷ÒªÒÀÀµÒôƵºÍÊÓÆµÖ®¼ä×ÔÈ»µÄ¶ÔÆë¡¢Í¬²½ÐÅÏ¢£¨alignment£©¡£¡£¡£¡£¡£¡£ÔÚ֮ǰµÄÊÂÇéÖÐÕâÒѾ±»Ö¤ÊµÊÇaudio-visualÁìÓòÓô¦×îÆÕ±éµÄ×Ô¼àÊÓÖ®Ò»[8]¡£¡£¡£¡£¡£¡£ÔÚÕâÀïÎÒÃÇʹÓÃÓïÒôÓëÈËÁ³ÐòÁÐÖ®¼äµÄ¶ÔÆë¹¹½¨contrastive loss¾ÙÐÐ¶ÔÆëµÄÔ¼Êø£»£»£»£»£»£»£»£»¶ÔÆëµÄÈËÁ³ÐòÁкÍÓïÒôÌØÕ÷
ÊÇÕýÑù±¾£¬£¬£¬£¬£¬£¬£¬·Ç¶ÔÆëµÄ
Ϊ¸ºÑù±¾¡£¡£¡£¡£¡£¡£½ç˵Á½¸öfeatureÖ®¼äµÄcos¾àÀëΪ
£¬£¬£¬£¬£¬£¬£¬ÕâÒ»Ô¼Êø¿ÉÒÔ±í´ïΪ£º

Devising Pose Code. ÁíÒ»·½Ã棬£¬£¬£¬£¬£¬£¬ÎÒÃǽèÖú3D±íÕ÷ÖеÄpiror knowledge¡£¡£¡£¡£¡£¡£Ò»¸ö12ά¶ÈµÄÏòÁ¿×ÅʵÒѾ×ãÒÔ±í´ïÈËÍ·µÄ×Ë̬£¬£¬£¬£¬£¬£¬£¬°üÀ¨Ò»¸ö9άµÄÐýת¾ØÕ󣬣¬£¬£¬£¬£¬£¬2άµÄÆ½ÒƺÍ1άµÄ±ê×¼¡£¡£¡£¡£¡£¡£ÒÔÊÇÎÒÃÇʹÓÃÒ»¸öÌØÁíÍâmapping£¬£¬£¬£¬£¬£¬£¬´ÓNon-Identity SpaceÖÐÓ³ÉäÒ»¸ö12άµÄPose Code¡£¡£¡£¡£¡£¡£Õâ¸öά¶ÈÉϵÄÉè¼ÆºÜÊÇÖ÷Òª£¬£¬£¬£¬£¬£¬£¬ÔõÑùά¶È¹ý´ó£¬£¬£¬£¬£¬£¬£¬ÕâÒ»latent codeËù±í´ïµÄ¾Í¿ÉÄÜÁè¼ÝposeÐÅÏ¢£¬£¬£¬£¬£¬£¬£¬µ¼ÖÂ×ìÐÍÊÕµ½Ó°Ïì¡£¡£¡£¡£¡£¡£
×îºóÎÒÃÇ°Ñ Identity Space£¬£¬£¬£¬£¬£¬£¬Speech Content Space ºÍ Pose code ÍŽáÆðÀ´£¬£¬£¬£¬£¬£¬£¬ËÍÈë»ùÓÚStyleGAN2[9]ˢеÄGenerator¡£¡£¡£¡£¡£¡£ÕâÈýÕßµÄÐÅÏ¢ÔÚGeneratorÖÐͨ¹ýͼÏñÖØÐÞѵÁ·¾ÙÐÐÆ½ºâ£¬£¬£¬£¬£¬£¬£¬lossÐÎʽʹÓÃÁËpix2pixHDµÄÖØÐÞѵÁ·loss¡£¡£¡£¡£¡£¡£ÔÚѵÁ·ÖУ¬£¬£¬£¬£¬£¬£¬pose codeÆð×÷ÓõÄÔÀíÊÇ£¬£¬£¬£¬£¬£¬£¬ÔÚIDºÍposeÐÅÏ¢¶¼ÏÔʽµØ±»Ô¼ÊøµÄÌõ¼þÏ£¬£¬£¬£¬£¬£¬£¬Pose Code×îÈÝÒ×ѧµ½µÄÐÅÏ¢ÊǸıäÈËÍ·µÄ×Ë̬£¬£¬£¬£¬£¬£¬£¬ÒÔïÔÌÖØÐÞµÄloss¡£¡£¡£¡£¡£¡£ÔÚÕâһĿµÄÏ£¬£¬£¬£¬£¬£¬£¬ÓÉÓÚ×Ë̬Öð½¥Óë¹þ¹þ(haha)ÌåÓýÄ¿µÄÌùºÏ£¬£¬£¬£¬£¬£¬£¬×ìÐ͵ÄÖØÐÞÔ¼ÊøÒ²»á·´¹ýÀ´×ÊÖúaudio featureµÄѧϰ£¬£¬£¬£¬£¬£¬£¬´Ó¶øµÖ´ïƽºâ¡£¡£¡£¡£¡£¡£
Part 3 ʵÑéЧ¹û
ÎÒÃÇÔÚÊýÖµÉϺÍÖÊÁ¿ÉÏÓë֮ǰSOTAµÄí§ÒâÓïÒôÇý¸ÐÈËÁ³µÄÒªÁì¾ÙÐÐÁ˱ÈÕÕ¡£¡£¡£¡£¡£¡£ÔÚÊýÖµÉÏ£¬£¬£¬£¬£¬£¬£¬ÎÒÃDZÈÕÕÁËLRWºÍVoxCeleb2Á½¸öÊý¾Ý¼¯£¬£¬£¬£¬£¬£¬£¬ÖØµã¹Ø×¢ÓÚÌìÉúͼÏñ»¹Ô¶È£¨SSIM£©£¬£¬£¬£¬£¬£¬£¬Í¼ÏñÇåÎú¶È£¨CPDB£©£¬£¬£¬£¬£¬£¬£¬ÌìÉú×ìÐÍlandmarkµÄ׼ȷ¶È£¨LMD£©ºÍÌìÉú×ìÐÍÓëÒôƵµÄͬ²½ÐÔ£¬£¬£¬£¬£¬£¬£¬Ê¹ÓÃSyncNet[8]µÄconfidence scoreÆÀ¼Û£¨
£©¡£¡£¡£¡£¡£¡£

ÎÒÃÇÓë֮ǰҪÁìµÄ±ÈÕÕͼÈçÏÂËùʾ£º

¸ü¶àµÄAblationºÍЧ¹û¿ÉÒԲο¼¹þ¹þ(haha)ÌåÓýpaperºÍdemo video£¬£¬£¬£¬£¬£¬£¬Õâ±ßչʾÁËÔÚ¼«¶ËÇéÐΣ¨´ó½Ç¶È£¬£¬£¬£¬£¬£¬£¬µÍÇø·ÖÂÊ£©µÄÌìÉúЧ¹û¡£¡£¡£¡£¡£¡£Õ¹Ê¾ÁËÈôÊÇÎÒÃǰÑpose codeÖÃ0£¬£¬£¬£¬£¬£¬£¬¿ÉÒÔʵÏÖתÕýµÄÓïÑÔÈËÁ³Ð§¹û¡£¡£¡£¡£¡£¡£

Part 4 ×ܽá
ÔÚÕâ¸öÊÂÇéÖУ¬£¬£¬£¬£¬£¬£¬ÎÒÃÇÌá³öÁËPose-Controllable Audio-Visual System (PC-AVS)£¬£¬£¬£¬£¬£¬£¬ÀÖ³ÉÔÚÓïÒôí§ÒâÓïÑÔÈ˵ÄsettingÏ£¬£¬£¬£¬£¬£¬£¬ÌìÉúÁË×Ë̬¿É¿ØµÄЧ¹û¡£¡£¡£¡£¡£¡£×ÛºÏÀ´¿´¹þ¹þ(haha)ÌåÓýÒªÁìÓÐÒÔϼ¸¸öÌØÖÊÖµµÃ¹Ø×¢£º
¹þ¹þ(haha)ÌåÓýÒªÁì²»½èÖúÔ¤½ç˵µÄ½á¹¹ÐÅÏ¢£¬£¬£¬£¬£¬£¬£¬½öʹÓÃÒ»¸öͼÏñÖØÐÞµÄpipeline£¬£¬£¬£¬£¬£¬£¬Àֳɽç˵ÁËÒ»¸ö¶ÔÈËÁ³poseµÄ±íÕ÷¡£¡£¡£¡£¡£¡£
ÓÉstyle-based generatorƽºâµÄѵÁ·Ä£Ê½Èô½ÐÎÌìÉúÊÕµ½¸üÆõºÏµÄÖØÐÞÔ¼Êø£¬£¬£¬£¬£¬£¬£¬´Ó¶øÌáÉýÁË´½ÐÎ¶ÔÆëµÄ׼ȷ¶È¡£¡£¡£¡£¡£¡£
ÎÒÃÇʵÏÖÁËí§ÒâÓïÑÔÈËÁ³ÏµÄ×ÔÓÉÈËÍ·×Ë̬¿ØÖÆ£¬£¬£¬£¬£¬£¬£¬Ê¹ÌìÉúµÄЧ¹ûÔ½·¢ÕæÊµ¡£¡£¡£¡£¡£¡£
¹þ¹þ(haha)ÌåÓýÄ£×ÓÔÚ¼«¶ËÇéÐÎÏÂÓкܺõij°ôÐÔ£¬£¬£¬£¬£¬£¬£¬²¢ÇÒʵÏÖÁËתÕýµÄÓïÑÔÈËÁ³ÌìÉú¡£¡£¡£¡£¡£¡£
Ïà¹ØÁ´½Ó
Paper µØµã£ºhttps://arxiv.org/abs/2104.11116
Github£ºhttps://github.com/Hangz-nju-cuhk/Talking-Face_PC-AVS
Project Page£ºhttps://hangz-nju-cuhk.github.io/projects/PC-AVS
References
1.#What comprises a good talking-head video generation?: A Survey and Benchmark https://arxiv.org/abs/2005.03201
2. #Joon Son Chung, Amir Jamaludin, and Andrew Zisserman. You said that? In BMVC, 2017. https://arxiv.org/abs/1705.02966
3. #Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, and Xiaogang Wang. Talking face generation by adversarially disentangled audio-visual representation. In Proceedings of the AAAI ConConference on Artificial Intelligence (AAAI), 2019. https://arxiv.org/abs/1807.07860
4. #Lele Chen, Ross K Maddox, Zhiyao Duan, and Chenliang Xu. Hierarchical cross-modal talking face generation with dynamic pixel-wise loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. https://www.cs.rochester.edu/u/lchen63/cvpr2019.pdf
5. #Yang Zhou, Xintong Han, Eli Shechtman, Jose Echevarria, Evangelos Kalogerakis, and Dingzeyu Li. Makeittalk: Speaker-aware talking head animation. SIGGRAPH ASIA, 2020. https://arxiv.org/abs/2004.12992
6. #Lele Chen, Guofeng Cui, Celong Liu, Zhong Li, Ziyi Kou, Yi Xu, and Chenliang Xu. Talking-head generation with rhythmic head motion. European Conference on Computer Vision (ECCV), 2020. https://www.cs.rochester.edu/u/lchen63/eccv2020-arxiv.pdf
7. #Egor Burkov, Igor Pasechnik, Artur Grigorev, and Victor Lem-pitsky. Neural head reenactment with latent pose descriptors. In Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition (CVPR), 2020. https://openaccess.thecvf.com/content_CVPR_2020/papers/Burkov_Neural_Head_Reenactment_with_Latent_Pose_Descriptors_CVPR_2020_paper.pdf
8. #Joon Son Chung and Andrew Zisserman. Out of time: auto-mated lip sync in the wild. In ACCV Workshop, 2016. https://www.robots.ox.ac.uk/~vgg/publications/2016/Chung16a/chung16a.pdf
9.#Tero Karras, Samuli Laine, Miika Aittala, Janne Hellsten,Jaakko Lehtinen, and Timo Aila. Analyzing and improving the image quality of stylegan. InProceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition (CVPR), 2020. https://openaccess.thecvf.com/content_CVPR_2020/papers/Karras_Analyzing_and_Improving_the_Image_Quality_of_StyleGAN_CVPR_2020_paper.pdf





·µ»Ø