INDEX
Explanations
various forms of language and communication in written texts
New Auto-Interp
Negative Logits
ó
-0.17
Äĵ
-0.16
974
-0.16
Ä«
-0.16
úi
-0.16
æ¯
-0.15
ó
-0.15
al
-0.15
.sz
-0.15
ÑĢади
-0.14
POSITIVE LOGITS
Ãł
0.28
Ãł
0.27
'Ãł
0.22
ÃĢ
0.22
Ãłn
0.22
ÃĢ
0.21
Ãłm
0.20
bÃł
0.19
’Ãł
0.19
th
0.18
Activations Density 0.015%