INDEX
Explanations
instances of structured phrases that start with "to the"
occurrences of the word "the."
New Auto-Interp
Negative Logits
reau
-0.68
followed
-0.65
Takes
-0.64
eton
-0.63
Split
-0.62
leground
-0.62
packages
-0.61
.–
-0.60
chu
-0.60
esh
-0.60
POSITIVE LOGITS
extent
1.45
detriment
1.25
fullest
1.21
tune
1.01
rouse
0.99
venge
0.97
same
0.95
nearest
0.93
forefront
0.92
depths
0.88
Activations Density 0.260%