INDEX
Explanations
instances of significant events or actions related to social interactions or community involvement
New Auto-Interp
Negative Logits
Mali
-0.15
ollen
-0.15
ament
-0.14
ì´Ī
-0.14
ments
-0.14
amel
-0.14
558
-0.14
èĩ¨
-0.14
fier
-0.13
_mas
-0.13
POSITIVE LOGITS
isay
0.19
ATAL
0.17
orris
0.16
γα
0.15
ihan
0.15
makta
0.14
%c
0.14
USA
0.14
¯u
0.14
ije
0.14
Activations Density 0.001%