INDEX
Explanations
the word "et" and its variations
New Auto-Interp
Negative Logits
erator
-0.17
plum
-0.15
tend
-0.15
inecraft
-0.15
-mouth
-0.14
baÅŁ
-0.14
usement
-0.14
ogle
-0.14
dater
-0.14
istr
-0.14
POSITIVE LOGITS
ymology
0.31
ihad
0.28
ienne
0.26
ching
0.25
ched
0.25
ablish
0.24
iology
0.23
ym
0.23
alon
0.23
iological
0.22
Activations Density 0.014%