- ½¹µãÊÖÒÕ
- ÒÔÔ´´ÊÖÒÕϵͳΪ»ù±¾£¬£¬£¬£¬£¬£¬SenseCoreÉÌÌÀAI´ó×°ÖÃΪ½¹µã»ù×ù£¬£¬£¬£¬£¬£¬½á¹¹¶àÁìÓò¡¢¶àÆ«ÏòÇ°ÑØÑо¿£¬£¬£¬£¬£¬£¬
¿ìËÙÂòͨAIÔÚ¸÷¸ö±ÊÖ±³¡¾°ÖеÄÓ¦Ó㬣¬£¬£¬£¬£¬ÏòÐÐÒµ¸³ÄÜ¡£¡£¡£¡£¡£¡£¡£
ICLR 2023 Oral | ̽Ë÷ÍøÂç½á¹¹ºÍÓò·º»¯ÄÜÁ¦µÄ¹ØÏµ

01 ÄîÍ·ÓëÅä¾°
Óò·º»¯ÊÇÖ¸ÔÚÒ»¸öеġ¢Î´ÖªµÄÁìÓò»òÇéÐÎÖУ¬£¬£¬£¬£¬£¬Ò»¸öÄ£×Ó¿ÉÒÔÔÚûÓÐÌØ¶¨µÄÁìÓò»òÇéÐÎ֪ʶµÄÇéÐÎϾÙÐÐÓÐÓõÄÕ¹Íû¡£¡£¡£¡£¡£¡£¡£
Óò·º»¯ÔÚÏÖʵÉúÑÄÖоßÓкܸߵÄÓ¦ÓüÛÖµ¡£¡£¡£¡£¡£¡£¡£ÀýÈ磬£¬£¬£¬£¬£¬ÔÚÒ½ÁÆÕï¶ÏÖУ¬£¬£¬£¬£¬£¬ÓÉÓÚÒ½ÁÆÊý¾ÝÄÑÒÔ»ñÈ¡£¬£¬£¬£¬£¬£¬Ä£×Ó±ØÐèÔÚ²î±ðµÄÒ½Ôº¡¢¶¼»á»ò¹ú¼ÒÖ®¼ä¾ÙÐзº»¯£¬£¬£¬£¬£¬£¬ÒÔ±ãÓÐÓõؾÙÐÐÕï¶Ï£»£»£»£»£»£»ÔÚ×Ô¶¯¼ÝÊ»Æû³µÁìÓò£¬£¬£¬£¬£¬£¬Ä£×Ó±ØÐèÄܹ»ÔÚÖÖÖÖÌìÆø¡¢Â·¿öºÍõè¾¶ÀàÐ͵Ȳî±ðÇéÐÎÏ·º»¯£¬£¬£¬£¬£¬£¬ÒÔʵÏÖ¿É¿¿µÄ×Ô¶¯¼ÝÊ»¡£¡£¡£¡£¡£¡£¡£
Òò´Ë£¬£¬£¬£¬£¬£¬Óò·º»¯ÊÇÒ»¸öÖ÷ÒªµÄÑо¿Æ«Ïò£¬£¬£¬£¬£¬£¬¿ÉÒÔʹ»úеѧϰģ×ÓÔÚ¸üÆÕ±éµÄÏÖʵӦÓó¡¾°ÖÐʵÏÖ¸üºÃµÄÐÔÄÜ¡£¡£¡£¡£¡£¡£¡£
ÏÖÔÚ£¬£¬£¬£¬£¬£¬Óò·º»¯ÒªÁìÖ÷Òª¿ÉÒÔ·ÖΪÒÔϼ¸Àࣺ
»ùÓÚÊý¾ÝÔöÇ¿µÄÒªÁ죺¸ÃÒªÁìͨ¹ý¶ÔѵÁ·Êý¾Ý¾ÙÐвî±ðµÄÔöÇ¿²Ù×÷£¬£¬£¬£¬£¬£¬ÈçÐýת¡¢Æ½ÒÆ¡¢Ëõ·ÅµÈ£¬£¬£¬£¬£¬£¬ÔöÌíѵÁ·Êý¾ÝµÄ¶àÑùÐÔ£¬£¬£¬£¬£¬£¬ÒÔÌá¸ßÄ£×ӵķº»¯ÄÜÁ¦ »ùÓÚÌØÕ÷¶ÔÆëµÄÒªÁ죺¸ÃÒªÁìͨ¹ý¶ÔÔ´ÓòºÍÄ¿µÄÓòµÄÌØÕ÷¾ÙÐÐ¶ÔÆë£¬£¬£¬£¬£¬£¬¼õС²î±ðÓòÖ®¼äµÄÂþÑܲî±ð£¬£¬£¬£¬£¬£¬ÒÔÌá¸ßÄ£×ӵķº»¯ÄÜÁ¦ »ùÓÚԪѧϰµÄÒªÁ죺¸ÃÒªÁìͨ¹ýÔÚѵÁ·Àú³ÌÖÐѧϰÔõÑù¿ìËÙ˳ӦеÄÁìÓò£¬£¬£¬£¬£¬£¬ÒÔÌá¸ßÄ£×ӵķº»¯ÄÜÁ¦ »ùÓÚ¼¯³ÉѧϰµÄÒªÁ죺¸ÃÒªÁìͨ¹ý×éºÏ¶à¸ö²î±ðµÄÄ£×Ó»òѵÁ·Àú³Ì£¬£¬£¬£¬£¬£¬Ìá¸ßÄ£×ӵķº»¯ÄÜÁ¦
ÒÔÉÏËùÌáµ½µÄÒªÁì¸÷ÓÐÓÅÁÓ£¬£¬£¬£¬£¬£¬ÆäÖÐÖ÷ÒªµÄȱÏݰüÀ¨£º
»ùÓÚÊý¾ÝÔöÇ¿µÄÒªÁì¿ÉÄܻᵼÖ¹ýÄâºÏ£¬£¬£¬£¬£¬£¬ÓÉÓÚÔöÇ¿²Ù×÷¿ÉÄÜ»áʹģ×ÓÌ«¹ý¹Ø×¢Ò»Ð©Ìض¨µÄÌØÕ÷ »ùÓÚÌØÕ÷¶ÔÆëµÄÒªÁìÐèÒª¶ÔÔ´ÓòºÍÄ¿µÄÓòµÄÊý¾Ý¾ÙÐÐ¶ÔÆë£¬£¬£¬£¬£¬£¬µ«ÔÚÏÖʵӦÓÃÖУ¬£¬£¬£¬£¬£¬Ô´ÓòºÍÄ¿µÄÓòÖ®¼äµÄÂþÑܲî±ð¿ÉÄܺÜÊǴ󣬣¬£¬£¬£¬£¬µ¼ÖÂ¶ÔÆëЧ¹û²»¼Ñ »ùÓÚԪѧϰµÄÒªÁìÐèÒª´ó×ÚµÄԪѵÁ·Êý¾Ý£¬£¬£¬£¬£¬£¬²¢ÇÒ¿ÉÄܻᵼÖ¹ýÄâºÏ£¬£¬£¬£¬£¬£¬ÓÉÓÚԪѧϰµÄÄ¿µÄÊÇÔÚѵÁ·Àú³ÌÖпìËÙ˳ӦеÄÁìÓò£¬£¬£¬£¬£¬£¬¶ø²»ÊÇÔÚÕû¸öѵÁ·¼¯ÉÏ»ñµÃ×î¼ÑÐÔÄÜ »ùÓÚ¼¯³ÉѧϰµÄÒªÁìÐèÒª×éºÏ¶à¸öÄ£×Ó»òѵÁ·Àú³Ì£¬£¬£¬£¬£¬£¬Õâ¿ÉÄܻᵼÖÂÅÌË㱾Ǯ½Ï¸ß£¬£¬£¬£¬£¬£¬²¢ÇÒ¿ÉÄÜÐèÒª¸ü¶àµÄѵÁ·Êý¾ÝÀ´ÑµÁ·¶à¸öÄ£×Ó
ÔÚÊìϤµ½ÏÖÔÚµÄԤ˳ӦҪÁìËù±£´æµÄÎÊÌâºó£¬£¬£¬£¬£¬£¬ÎÒÃÇÒÔΪÓÐÐëÒª´ÓÒ»¸öеĽǶÈȥ˼Ë÷ÔõÑù¸üºÃµØ½â¾öÕâ¸öÎÊÌâ¡£¡£¡£¡£¡£¡£¡£
×î½ü·ºÆðµÄVision TransformersÔÚÊÓ¾õÁìÓòµÄ¸÷¸öʹÃüÖÐÖð½¥È¡´úÁËCNN£¬£¬£¬£¬£¬£¬³ÉΪ±»ÆÕ±é½ÓÄɵÄÍøÂç½á¹¹¡£¡£¡£¡£¡£¡£¡£Òò´Ë£¬£¬£¬£¬£¬£¬ÎÒÃÇÒÔÎªÍøÂç½á¹¹ºÍ·º»¯ÐÔÖ®¼ä¿ÉÄܱ£´æ×ÅÃܲ»¿É·ÖµÄÁªÏµ¡£¡£¡£¡£¡£¡£¡£
ÔÚ»úеѧϰÖУ¬£¬£¬£¬£¬£¬¹éÄÉÆ«ÖÃÊÇÖ¸ÔÚÄ£×ÓÑ¡ÔñºÍѧϰËã·¨ÖÐʹÓõÄÏÈÑé֪ʶºÍ¼ÙÉ裬£¬£¬£¬£¬£¬ËüÃÇ¿ÉÒÔ×ÊÖúÄ£×Ó´ÓÊý¾ÝÖÐѧϰÓÐÓõÄģʽ£¬£¬£¬£¬£¬£¬¶ø²»µ«ÊǼÇ×ÅÌØ¶¨µÄѵÁ·ÊµÀý¡£¡£¡£¡£¡£¡£¡£Ò»¸öºÃµÄ¹éÄÉÆ«ÖÿÉÒÔ×ÊÖúÄ£×Ó¸ü¿ìµØÊÕÁ²£¬£¬£¬£¬£¬£¬¸ü׼ȷµØ·º»¯µ½ÐÂÊý¾Ý£¬£¬£¬£¬£¬£¬ÒÔ¼°¸üºÃµØ¶Ô¿¹¹ýÄâºÏ¡£¡£¡£¡£¡£¡£¡£
²î±ðµÄÍøÂç½á¹¹¿ÉÒÔÌṩ²î±ðµÄ¹éÄÉÆ«Öᢲî±ðµÄÄÜÁ¦À´ÌåÏÖÊý¾ÝµÄÌØÕ÷¡£¡£¡£¡£¡£¡£¡£ÀýÈ磬£¬£¬£¬£¬£¬¾í»ýÉñ¾ÍøÂ磨CNN£©ÔÚͼÏñÁìÓòµÄÓ¦ÓÃÖÐÌåÏÖ¾«²Ê£¬£¬£¬£¬£¬£¬ÊÇÓÉÓÚCNN½á¹¹ÌìÉúÊʺϴ¦Öóͷ£Í¼ÏñÖеľֲ¿ÐÔÇå¾²ÒÆÎȹÌÐÔ¡£¡£¡£¡£¡£¡£¡£ÀàËÆµØ£¬£¬£¬£¬£¬£¬Ñ»·Éñ¾ÍøÂ磨RNN£©Êʺϴ¦Öóͷ£ÐòÁÐÊý¾Ý£¬£¬£¬£¬£¬£¬ÓÉÓÚËüÃǾßÓÐ×ÔÈ»µÄʱ¼ä¹éÄÉÆ«Öᣡ£¡£¡£¡£¡£¡£
ÏÖÔÚÒѾÓÐһЩÏà¹ØÊÂÇéÌá³öÁËÀíÂÛ¹¤¾ß[1,2]£¬£¬£¬£¬£¬£¬ÓÃÓÚÆÊÎöÉñ¾ÍøÂç½á¹¹ÔÚ½â¾ö²î±ðÎÊÌâʱµÄÄÜÁ¦Ç¿Èõ¡£¡£¡£¡£¡£¡£¡£È»¶ø£¬£¬£¬£¬£¬£¬ÏÖÔÚÕâЩÆÊÎöÈÔÈ»±£´æÓÚIn-distribution learning problem ÖУ¬£¬£¬£¬£¬£¬¶ø¹þ¹þ(haha)ÌåÓýÎÊÌâÔò¸ü¹Ø×¢ÓÚOut-of-distribution learning problem¡£¡£¡£¡£¡£¡£¡£Òò´Ë£¬£¬£¬£¬£¬£¬ÎÒÃǶÔ[1]ÖÐÌá³öµÄalgorithmic alignmentÔÚDGÎÊÌâÉϾÙÐÐÁËÑÓÉìÆÊÎö¡£¡£¡£¡£¡£¡£¡£
02 ÒªÁìÏÈÈÝ
³Ð½ÓÉÏÊöÆÊÎö£¬£¬£¬£¬£¬£¬ÎÒÃÇÍÆ²â¡ºÒ»¸öºÃµÄÍøÂç½á¹¹¿ÉÄܸüÈÝÒ×ÔÚÊý¾ÝÖÐѧϰµ½¸üÊÊÓÃÓÚÓò·º»¯µÄÌØÕ÷¡»¡£¡£¡£¡£¡£¡£¡£½ÓÏÂÀ´ÎÒÃǽèÖúAlgorithmic Alignment¹¤¾ß£¬£¬£¬£¬£¬£¬´ÓÕâ¸öÍÆ²â³ö·¢£¬£¬£¬£¬£¬£¬ÔÚÀíÂÛÉÏÒ»²½²½¾ÙÐÐÆÊÎö¡£¡£¡£¡£¡£¡£¡£
Ê×ÏÈÎÒÃǼòÆÓÏÈÈÝAlgorithmic Alignment£¬£¬£¬£¬£¬£¬Ëüͨ¹ýȨºâÉñ¾ÍøÂç½á¹¹ÓëÄ¿µÄº¯ÊýÖ®¼äµÄÏàËÆÐÔ±íÕ÷×ÔÁ¦Í¬ÂþÑÜ£¨IID£©ÍÆÀíʹÃüµÄÒ×´¦Öóͷ£Ë®Æ½£¨Easiness£©¡£¡£¡£¡£¡£¡£¡£
Algorithmic Alignment±»Õýʽ½ç˵ΪÒÔÏÂÄÚÈÝ¡£¡£¡£¡£¡£¡£¡£

½ÓÏÂÀ´£¬£¬£¬£¬£¬£¬ÎÒÃÇÔÚDGÖнç˵ÁËһЩҪº¦¿´·¨¡£¡£¡£¡£¡£¡£¡£Ä¿µÄº¯ÊýÊÇѵÁ·¼¯ºÍ²âÊÔ¼¯Ö®¼äµÄÎȹ̹ØÏµ¡£¡£¡£¡£¡£¡£¡£ÎªÁ˼òÆÓÆð¼û£¬£¬£¬£¬£¬£¬ÎÒÃǼÙÉè±êÇ©ÊÇÎÞÔëÉùµÄ¡£¡£¡£¡£¡£¡£¡£

½èÖúÒÔÉϵĽç˵£¬£¬£¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔ½«Ëã·¨¶ÔÆë´Ó×ÔÁ¦Í¬ÂþÑÜ·º»¯£¨IID generalization£©À©Õ¹µ½Óò·º»¯£¨DG£©ÎÊÌâÉÏ¡£¡£¡£¡£¡£¡£¡£

Theorem 1 Åú×¢£¬£¬£¬£¬£¬£¬ÓëÎȹ̹ØÏµ¶ÔÆëµÄÍøÂç¸üÄܹ»¶Ô¿¹ÂþÑܵÄת±ä¡£¡£¡£¡£¡£¡£¡£ÎÒÃÇ¿ÉÒÔÓÃʵÑéÄ¥Á·£¬£¬£¬£¬£¬£¬²î±ðÀàÐ͵ÄÍøÂçµÄ·º»¯ÄÜÁ¦Ç¿Èõ¡£¡£¡£¡£¡£¡£¡£
ÎÒÃÇÔÚDomainBedÉÏÊ×ÏȲâÊÔʹÓÃERMѵÁ·µÄViTµÄÐÔÄÜ£¬£¬£¬£¬£¬£¬Ð§¹ûÈçͼ1(a)Ëùʾ¡£¡£¡£¡£¡£¡£¡£ÁîÈ˾ªÑȵÄÊÇ£¬£¬£¬£¬£¬£¬ÔÚʹÓÃÁ˸üÉÙ²ÎÊýµÄÇéÐÎÏ£¬£¬£¬£¬£¬£¬Ê¹ÓÃERMѵÁ·µÄViTÔÚ¼¸¸öÊý¾Ý¼¯ÉÏÒѾÓÅÓÚʹÓÃSOTA DGËã·¨µÄResNet-50¡£¡£¡£¡£¡£¡£¡£ÕâÅú×¢ÔÚDGÖУ¬£¬£¬£¬£¬£¬Ñ¡ÔñÖ÷¸ÉÍøÂç½á¹¹¿ÉÄܱÈËðʧº¯Êý¸üΪÖ÷Òª¡£¡£¡£¡£¡£¡£¡£

ÎÒÃÇ¿ÉÒÔ·¢Ã÷£¬£¬£¬£¬£¬£¬ÈôÊÇÉñ¾ÍøÂç½á¹¹ÓëÎȹ̹ØÏµ£¨invariant correlation£©¶ÔÆë£¬£¬£¬£¬£¬£¬ERM×ãÒÔʵÏÖÓÅÒìµÄÐÔÄÜ¡£¡£¡£¡£¡£¡£¡£ÔÚOfficeHome»òDomainNetµÄijЩÁìÓòÖУ¬£¬£¬£¬£¬£¬ÐÎ×´ÊôÐÔÓë±êǩ֮¼ä±£´æÎȹ̹ØÏµ£¬£¬£¬£¬£¬£¬Èçͼ1(b)Ëùʾ¡£¡£¡£¡£¡£¡£¡£
Ïà·´£¬£¬£¬£¬£¬£¬ÊôÐÔÎÆÀíºÍ±êǩ֮¼ä±£´æÐéαÏà¹ØÐÔ£¨spurious correlation£©¡£¡£¡£¡£¡£¡£¡£Æ¾Ö¤[3]µÄÆÊÎö£¬£¬£¬£¬£¬£¬¶àÍ·×¢ÖØÁ¦£¨MHA£©ÊǾßÓÐÐÎ×´Æ«ÖõĵÍͨÂ˲¨Æ÷£¬£¬£¬£¬£¬£¬¶ø¾í»ýÊǾßÓÐÎÆÀíÆ«ÖõĸßͨÂ˲¨Æ÷¡£¡£¡£¡£¡£¡£¡£Òò´Ë£¬£¬£¬£¬£¬£¬½öʹÓÃERMѵÁ·µÄ ViT¾Í¿ÉÒÔʤ¹ýʹÓÃSOTA DGË㷨ѵÁ·µÄCNN¡£¡£¡£¡£¡£¡£¡£
½øÒ»²½µØ£¬£¬£¬£¬£¬£¬ÎÒÃÇÒ²ºÜºÃÆæÔõÑùÌá¸ßViTµÄ·º»¯ÄÜÁ¦£¿£¿£¿£¿£¿£¿£¿£¿Theorem 1½¨ÒéÎÒÃÇÓ¦¸ÃʹÓÃÎȹ̹ØÏµµÄÌØÕ÷¡£¡£¡£¡£¡£¡£¡£
ÔÚͼÏñʶ±ðÖУ¬£¬£¬£¬£¬£¬Ò»¸öÎïÌåͨ³£Óɲî±ð²¿·Ö×é³É£¨ÀýÈ磬£¬£¬£¬£¬£¬ÎÒÃÇ¿ÉÒÔÓÃÊÓ¾õÊôÐÔÀ´×éºÏÐÔµÄÐÎòһ¸öÎïÌå[4]£©¡£¡£¡£¡£¡£¡£¡£ÔÚÕæÊµÌìϵÄͼÏñÊý¾ÝÖУ¬£¬£¬£¬£¬£¬±êÇ©ÒÀÀµÓÚ¶à¸öÊôÐÔ¡£¡£¡£¡£¡£¡£¡£¹ØÓÚDG¶øÑÔ£¬£¬£¬£¬£¬£¬²¶»ñ¶àÑùµÄÊÓ¾õÊôÐÔÌØÊâÖ÷Òª¡£¡£¡£¡£¡£¡£¡£ÀýÈ磬£¬£¬£¬£¬£¬Å£½ò´ÇÊéÖжԴóÏóµÄ½ç˵ÊÇ¡°Ò»ÖÖÓµÓкñºñµÄ»ÒɫƤ·ô¡¢´ó¶ú¶ä¡¢Á½¸ö³ÆÎªÏóÑÀµÄÍäÇúÍâ³ÝºÍÒ»¸ö³ÆÎªÏó±ÇµÄ³¤±Ç×ӵĴóÐͶ¯Î¡£¡£¡£¡£¡£¡£¡£

ÄÇô£¬£¬£¬£¬£¬£¬Ó¦¸ÃÔõÑù²¶»ñÕâЩÊÓ¾õÌØÕ÷ÄØ£¿£¿£¿£¿£¿£¿£¿£¿ÕâЩÊÓ¾õÌØÕ÷ÓÖÊÇÔõÑù¾öÒéÒ»¸öÎïÌåµÄÀà±ðµÄÄØ£¿£¿£¿£¿£¿£¿£¿£¿

Ìõ¼þÓï¾ä£¨¼´±à³ÌÖÐµÄ IF/ELSE£©£¬£¬£¬£¬£¬£¬ÈçËã·¨1Ëùʾ£¬£¬£¬£¬£¬£¬ÔÚDGÎÊÌâÀ£¬£¬£¬£¬£¬¿ÉÒÔ±»ÊÔ×öƾ֤ÊÓ¾õÊôÐÔµÄ×éºÏ£¬£¬£¬£¬£¬£¬ÔÚ²î±ðÓòÖÐÅжÏÒ»¸öÎïÌåµÄÀà±ðµÄ¹¤¾ß¡£¡£¡£¡£¡£¡£¡£
¼ÙÉèÎÒÃÇÔÚDomainNetÉÏѵÁ·ÍøÂçÒÔʶ±ð´óÏ󣬣¬£¬£¬£¬£¬Èçͼ1(b)µÄµÚÒ»ÐÐËùʾ¡£¡£¡£¡£¡£¡£¡£¹ØÓÚ²î±ðÁìÓòµÄ´óÏ󣬣¬£¬£¬£¬£¬ÐÎ×´ºÍÎÆÀí²î±ðÏÔÖø£¬£¬£¬£¬£¬£¬¶øÊÓ¾õÊôÐÔ£¨´ó¶ú¶ä¡¢ÍäÇúµÄÑÀ³Ý¡¢³¤±Ç×Ó£©ÔÚËùÓÐÁìÓòÖж¼ÊÇÎȹ̵ġ£¡£¡£¡£¡£¡£¡£½èÖúÌõ¼þÓï¾ä£¬£¬£¬£¬£¬£¬¶Ô´óÏóµÄʶ±ð¿ÉÒÔ±íÊöΪ¡°ÈôÊÇÒ»Ö»¶¯ÎïÓдó¶ú¶ä¡¢Á½¸öÍäÇúµÄÍâ³ÝºÍÒ»¸ö³¤±Ç×Ó£¬£¬£¬£¬£¬£¬ÄÇôËü¾ÍÊÇÒ»Ö»´óÏ󡱡£¡£¡£¡£¡£¡£¡£È»ºó×ÓʹÃüÊÇʶ±ðÕâЩÊÓ¾õÊôÐÔ£¬£¬£¬£¬£¬£¬ÕâÒ²ÐèÒªÌõ¼þÓï¾ä¡£¡£¡£¡£¡£¡£¡£

ͨ¹ýTheorem 2£¬£¬£¬£¬£¬£¬ÎÒÃÇ֤ʵÎúÒ»¸ö»ùÓÚViT½á¹¹µÄ¶àExpertsµÄMixture-of-ExpertsÍøÂç½á¹¹£¬£¬£¬£¬£¬£¬¿ÉÒԺܺõØÔÚAlgorithmic Alignment¿ò¼ÜÏÂ¶ÔÆë IF-ELSE Óï¾ä¡£¡£¡£¡£¡£¡£¡£Í¨¹ýÖ´ÐÐIF-ELSEÓï¾ä£¬£¬£¬£¬£¬£¬Äܹ»ºÜºÃµØ²¶»ñµ½Ò»¸öÎïÌåµÄ²î±ðÇøÓòµÄÌØÕ÷£¨Èç´óÏóµÄ´ó¶ú¶ä¡¢ÍäÇúµÄÑÀ³Ý¡¢³¤±Ç×Ó£©¡£¡£¡£¡£¡£¡£¡£ÎÒÃÇÒ²»ùÓÚǰÈËÔÚ MoE Æ«ÏòµÄ̽Ë÷[5,6]£¬£¬£¬£¬£¬£¬Ìá³öÁ˹þ¹þ(haha)ÌåÓýGeneralizable Mixture-of-Experts (GMoE)¡£¡£¡£¡£¡£¡£¡£Æä½á¹¹ÈçÏ£º

03 ʵÑéЧ¹û
ÎÒÃÇÔÚTable 1ÖÐÌṩÁËtrain-validation selectionµÄЧ¹û£¬£¬£¬£¬£¬£¬ÆäÖаüÀ¨baselines¡¢×îеÄSOTA DGÒªÁìÒÔ¼°Ê¹ÓÃERMѵÁ·µÄGMoE¡£¡£¡£¡£¡£¡£¡£
Ч¹ûÅú×¢£¬£¬£¬£¬£¬£¬GMoE-S/16×ÝÈ»ÔÚûÓÐDGËã·¨µÄÇéÐÎÏ£¬£¬£¬£¬£¬£¬ÒѾÔÚÏÕЩËùÓÐÊý¾Ý¼¯ÉÏÌåÏÖÓÅÓÚÒÔǰ»ùÓÚResNet-50-S/16µÄDGÒªÁì¡£¡£¡£¡£¡£¡£¡£

GMoEµÄ·º»¯ÄÜÁ¦À´×ÔÓÚÆäÄÚ²¿Ö÷¸ÉÍøÂç½á¹¹£¬£¬£¬£¬£¬£¬ÕâÓëÏÖÓеÄDGËã·¨ÊÇÕý½»µÄ¡£¡£¡£¡£¡£¡£¡£ÕâÒâζ×ÅSOTA DGËã·¨¿ÉÒÔÓ¦ÓÃÓÚË¢ÐÂGMoEµÄÐÔÄÜ¡£¡£¡£¡£¡£¡£¡£
ΪÁËÑéÖ¤Õâ¸öÏë·¨£¬£¬£¬£¬£¬£¬ÎÒÃÇÓ¦ÓÃÁËÁ½¸öSOTA DGË㷨ˢÐÂGMoE£¬£¬£¬£¬£¬£¬ÆäÖÐÒ»¸öÊÇÐÞ¸ÄËðʧº¯ÊýµÄÒªÁ죨FISH£©£¬£¬£¬£¬£¬£¬ÁíÒ»¸öÊǽÓÄÉÄ£×Ó¼¯³ÉµÄÒªÁ죨SWAD£©¡£¡£¡£¡£¡£¡£¡£Table 2µÄЧ¹ûÅú×¢£¬£¬£¬£¬£¬£¬½ÓÄÉGMoE£¬£¬£¬£¬£¬£¬Ïà±ÈÓÚResNet-50£¬£¬£¬£¬£¬£¬ÏÔÖøÌá¸ßÁËÕâЩÒÑÓÐDGÒªÁìµÄÐÔÄÜ¡£¡£¡£¡£¡£¡£¡£

ÎÒÃÇͬÑùÔÚÏÞÖÆÁË»ù´¡Ä£×ӽṹµÄIIDÐÔÄÜ£¨ViT-S/16ºÍResNet-50 V2£©»ù´¡ÉÏ£¬£¬£¬£¬£¬£¬½ÏÁ¿ÕâÁ½¸öÄ£×ÓµÄDGÐÔÄÜ¡£¡£¡£¡£¡£¡£¡£ÒÔÏÂÊDZÈÕÕЧ¹û£¬£¬£¬£¬£¬£¬¿ÉÒÔ¿´µ½ ViT-S/16ÔÚÂÔÊäResNet-50 V2µÄÇéÐÎÏ£¬£¬£¬£¬£¬£¬ÈÔÈ»ÔÚDGʹÃüÉÏÈ¡µÃÁ˸üºÃµÄÐÔÄÜ¡£¡£¡£¡£¡£¡£¡£

ÒÔÏÂÊÇGMoEµÄExpert Selection¿ÉÊÓ»¯Ð§¹û¡£¡£¡£¡£¡£¡£¡£Í¼ÏñÀ´×ÔÓÚCUB-DGÖÐ×ÔÈ»ÁìÓòµÄ²î±ðÖֱ𡣡£¡£¡£¡£¡£¡£Í¼Öвî±ðÑÕÉ«µÄÏßÅþÁ¬²î±ðͼÏñÉϵÄͳһÖÖ±ðÄñÀàµÄÊÓ¾õÊôÐÔ£¨Visual Attributes£©¡£¡£¡£¡£¡£¡£¡£Í³Ò»ÊÓ¾õÊôÐÔÓÉͳһExpert´¦Öóͷ££¬£¬£¬£¬£¬£¬ÀýÈç×ìºÍβ°ÍÓÉExpert 3´¦Öóͷ££¬£¬£¬£¬£¬£¬×ó/ÓÒÍÈÓÉExpert 4´¦Öóͷ£¡£¡£¡£¡£¡£¡£¡£


Ïà¹Ø×ÊÁÏ
Paper
https://openreview.net/forum?id=RecZ9nB9Q4
Github
https://github.com/Luodian/Generalizable-Mixture-of-Experts
Video
https://www.bilibili.com/video/BV1jV4y1C7h8/?spm_id_from=333.999.0.0
References£º
[1] Xu, Keyulu, et al. "What can neural networks reason about?." ICLR 2020 (Spotlight)
[2] Xu, Keyulu, et al. "How neural networks extrapolate: From feedforward to graph neural networks." ICLR 2021 (Oral)
[3] Namuk Park and Songkuk Kim. How do vision transformers work? ICLR 2022 (Spotlight)
[4] Object detectors emerge in deep scene cnns. ICLR 2015
[5] Riquelme, Carlos, et al. "Scaling vision with sparse mixture of experts." NeurIPS 2021
[6] Chi, Zewen, et al. "On the representation collapse of sparse mixture of experts." NeurIPS 2022





·µ»Ø