INDEX
Explanations
occurrences of the word "the"
New Auto-Interp
Negative Logits
baugh
-0.85
tons
-0.77
_-
-0.75
Occupations
-0.71
Alone
-0.69
raq
-0.68
SOL
-0.68
borough
-0.67
gow
-0.66
ibur
-0.66
POSITIVE LOGITS
nucleus
0.80
suggestion
0.79
rouse
0.71
ink
0.71
opposite
0.69
prospect
0.68
odds
0.67
threat
0.67
latest
0.66
extent
0.66
Activations Density 0.037%