INDEX
Explanations
words expressing perception or interpretation
New Auto-Interp
Negative Logits
sein
-0.17
pot
-0.17
ses
-0.15
esian
-0.15
ught
-0.15
Ñĭ
-0.15
apl
-0.15
omer
-0.14
acin
-0.14
ãĥ§
-0.14
POSITIVE LOGITS
zial
0.18
lessly
0.17
cref
0.17
ingly
0.16
.kr
0.15
ively
0.14
unci
0.14
лÑıн
0.14
BarItem
0.14
à¹Ĩ
0.13
Activations Density 0.041%