INDEX
Explanations
references to people and position titles
instances of the word "the."
New Auto-Interp
Negative Logits
arrow
-0.72
FIELD
-0.70
ties
-0.70
thood
-0.65
preceded
-0.63
amphetamine
-0.63
deals
-0.61
quit
-0.61
depended
-0.61
gotta
-0.60
POSITIVE LOGITS
latter
1.04
same
0.98
interviewer
0.90
latest
0.90
BBC
0.83
audience
0.81
aforementioned
0.80
agency
0.80
nation
0.80
extent
0.78
Activations Density 0.112%