INDEX
Explanations
words that indicate quantity, positioning, or relationships
New Auto-Interp
Negative Logits
ings
-0.15
_like
-0.15
eno
-0.15
premises
-0.14
UN
-0.14
zes
-0.14
matter
-0.14
ore
-0.14
way
-0.13
ometry
-0.13
POSITIVE LOGITS
.nlm
0.17
alia
0.16
.mas
0.15
owitz
0.14
ابت
0.14
imson
0.14
ë¯
0.14
еÑĢÑĤи
0.14
ê°ģ
0.14
presso
0.14
Activations Density 0.014%