INDEX
Explanations
references to antagonistic groups or characters
New Auto-Interp
Negative Logits
ighb
-0.16
enheim
-0.15
èİ«
-0.14
leigh
-0.14
adoo
-0.14
икÑĥ
-0.14
CRM
-0.14
igate
-0.14
cket
-0.14
orth
-0.14
POSITIVE LOGITS
inspir
0.15
ÃľR
0.15
ä¼ı
0.14
uzzi
0.14
Ĵ
0.14
jas
0.13
Æ¡
0.13
inspired
0.13
ochrome
0.13
Injury
0.13
Activations Density 0.000%