INDEX
Explanations
the word "de" appearing with varying activation values, potentially indicating a specific keyword or concept
instances of the word "de."
New Auto-Interp
Negative Logits
allery
-0.84
iggins
-0.68
sit
-0.68
icals
-0.67
annis
-0.67
hetti
-0.66
hips
-0.66
ieri
-0.66
impulse
-0.65
okin
-0.65
POSITIVE LOGITS
ploy
1.38
utsche
1.27
cember
1.18
leted
1.14
legate
1.13
legates
1.10
bris
1.06
cker
1.03
hyde
1.03
ktop
0.97
Activations Density 0.021%