INDEX
Explanations
articles (such as "the" and "an") followed by specific words or phrases
the presence of the word "the" in various contexts
New Auto-Interp
Negative Logits
besides
-0.63
ibl
-0.62
!!!
-0.58
???
-0.58
resided
-0.57
!.
-0.57
aji
-0.57
reminis
-0.57
distinguishes
-0.57
itars
-0.56
POSITIVE LOGITS
nation
0.99
same
0.96
latter
0.96
country
0.95
Kremlin
0.90
United
0.89
latest
0.89
Clintons
0.86
Philippines
0.85
aftermath
0.84
Activations Density 0.989%