INDEX
Explanations
references to role models and their impact on communities
New Auto-Interp
Negative Logits
íĦ¸
-0.16
igsaw
-0.14
Registrar
-0.14
511
-0.13
antu
-0.13
ạn
-0.13
Ñģна
-0.12
307
-0.12
aidu
-0.12
789
-0.12
POSITIVE LOGITS
role
0.70
Role
0.65
role
0.61
Role
0.56
ROLE
0.52
-role
0.52
_role
0.47
(role
0.43
.role
0.43
ROLE
0.43
Activations Density 0.157%