INDEX
Explanations
phrases expressing cognitive awareness or understanding
New Auto-Interp
Negative Logits
ton
-0.15
put
-0.15
sg
-0.15
bers
-0.14
da
-0.14
ми
-0.14
Tun
-0.14
min
-0.14
over
-0.14
borne
-0.14
POSITIVE LOGITS
Plug
0.15
ROKE
0.15
otton
0.15
ANNEL
0.15
plers
0.15
lemn
0.15
rank
0.15
ULK
0.14
_CFG
0.14
.dtd
0.14
Activations Density 0.030%