INDEX
Explanations
phrases that reference transformative actions or processes
New Auto-Interp
Negative Logits
inya
-0.16
ings
-0.15
zeitig
-0.15
need
-0.14
AVE
-0.14
kü
-0.14
itize
-0.14
rtc
-0.14
bsites
-0.14
['__
-0.14
POSITIVE LOGITS
/out
0.22
yre
0.16
ampie
0.16
abajo
0.15
prising
0.15
ezi
0.15
elman
0.15
gether
0.15
issance
0.14
el
0.14
Activations Density 0.141%