INDEX
Explanations
phrases that express personal reflections or subjective opinions
New Auto-Interp
Negative Logits
-*-č↵
-0.15
orthand
-0.14
larım
-0.12
_Valid
-0.12
надлеж
-0.12
_Two
-0.12
reglo
-0.11
ulması
-0.11
lepÅ¡ÃŃ
-0.11
æł·çļĦ
-0.11
POSITIVE LOGITS
one
1.20
ÛĮÚ©ÛĮ
0.74
uno
0.72
ä¹ĭä¸Ģ
0.69
eines
0.68
one
0.67
одного
0.66
salah
0.65
_one
0.63
одной
0.62
Activations Density 1.184%