INDEX
Explanations
the name "Jacobs" in the text
references to a specific individual named Jacobs
New Auto-Interp
Negative Logits
anguage
-0.98
achev
-0.83
gement
-0.77
ged
-0.76
itaire
-0.76
mberg
-0.76
atively
-0.75
etheless
-0.74
otypes
-0.73
vironment
-0.72
POSITIVE LOGITS
aign
0.98
Jacobs
0.84
quet
0.79
enter
0.77
keye
0.75
hma
0.73
agne
0.72
robe
0.71
xon
0.71
eries
0.71
Activations Density 0.032%