INDEX
Explanations
occurrences of the prefix "re-", indicating repetition or return to a previous state or action
New Auto-Interp
Negative Logits
ifact
-0.16
anka
-0.15
andr
-0.15
맨
-0.15
ouz
-0.14
hec
-0.14
fy
-0.14
atum
-0.14
ender
-0.14
priv
-0.13
POSITIVE LOGITS
kind
0.22
inv
0.19
kind
0.18
charge
0.18
eling
0.17
KIND
0.16
eled
0.16
Hatch
0.16
Kind
0.16
charges
0.16
Activations Density 0.034%