INDEX
Explanations
phrases indicating location or direction
instances of the phrase "one of" indicating examples or categories
New Auto-Interp
Negative Logits
lees
-0.77
ans
-0.70
anse
-0.66
ption
-0.65
body
-0.65
etermination
-0.64
matter
-0.62
disadvant
-0.60
leans
-0.60
ening
-0.59
POSITIVE LOGITS
these
0.77
Europe
0.76
those
0.75
three
0.75
our
0.74
Britain
0.74
two
0.72
America
0.72
my
0.71
innumerable
0.70
Activations Density 0.071%