INDEX
Explanations
mention of the word "the" in various contexts
New Auto-Interp
Negative Logits
deals
-0.70
ettings
-0.67
peak
-0.65
puff
-0.65
ben
-0.63
hooting
-0.63
duction
-0.61
oshi
-0.60
lang
-0.59
ward
-0.59
POSITIVE LOGITS
opportunity
1.22
same
1.14
slightest
1.11
utmost
1.08
privilege
0.98
requisite
0.97
idea
0.96
courage
0.95
guts
0.93
brunt
0.91
Activations Density 0.054%