INDEX
Explanations
phrases indicating an overall assessment or summary of situations
New Auto-Interp
Negative Logits
iel
-0.16
burg
-0.15
isco
-0.15
ovsky
-0.15
sport
-0.15
rol
-0.14
oret
-0.14
ượt
-0.14
Dj
-0.14
nt
-0.14
POSITIVE LOGITS
igator
0.19
mente
0.17
-purpose
0.17
ìłģìľ¼ë¡ľ
0.16
sense
0.16
ihn
0.16
/general
0.16
anton
0.16
lsru
0.16
ìłģìĿ¸
0.15
Activations Density 0.016%