INDEX
Explanations
mentions of the word "the" followed by another word starting with "he"
the pronoun "he" in various contexts
New Auto-Interp
Negative Logits
hips
-0.89
eleph
-0.78
GEAR
-0.69
linking
-0.64
domestically
-0.63
gearing
-0.60
Labrador
-0.60
gems
-0.60
correlation
-0.59
unfocusedRange
-0.58
POSITIVE LOGITS
isure
1.13
atre
1.05
lling
0.99
ller
0.98
aven
0.97
aton
0.96
ALTH
0.94
ather
0.93
gan
0.93
brew
0.93
Activations Density 0.057%