INDEX
Explanations
specific non-English or foreign language terms
New Auto-Interp
Negative Logits
ma
-0.23
pa
-0.20
me
-0.19
med
-0.19
li
-0.18
s
-0.18
ese
-0.18
ses
-0.18
ii
-0.18
pu
-0.18
POSITIVE LOGITS
akov
0.19
Leaks
0.19
yum
0.19
yas
0.18
yaw
0.18
ê¹
0.18
yat
0.17
yar
0.17
elho
0.16
yo
0.16
Activations Density 0.682%