INDEX
Explanations
phrases related to personal opinions and reasoning
New Auto-Interp
Negative Logits
Äįka
-0.15
::__
-0.15
avel
-0.14
aim
-0.14
ç´Ķ
-0.14
immer
-0.14
ź
-0.14
ikit
-0.14
etti
-0.14
jom
-0.14
POSITIVE LOGITS
fact
0.16
oppins
0.15
iva
0.15
.lt
0.15
ева
0.14
fact
0.14
reasons
0.14
åİŁåĽł
0.14
اÛĮع
0.14
ÙħارÛĮ
0.14
Activations Density 0.331%