INDEX
Explanations
phrases that indicate significant actions or states of being in relation to existence and presence
New Auto-Interp
Negative Logits
ertos
-0.14
Kir
-0.14
Kash
-0.14
kir
-0.14
Reverse
-0.13
egal
-0.13
reverse
-0.13
orent
-0.13
rx
-0.13
ution
-0.13
POSITIVE LOGITS
zell
0.16
ume
0.15
ÐĶÐļ
0.15
нем
0.14
ninh
0.14
oner
0.14
okie
0.14
enko
0.14
onna
0.13
_TI
0.13
Activations Density 0.011%