INDEX
Explanations
phrases indicating opposing sides or conflicts
instances of the phrase "at" followed by a location or context
New Auto-Interp
Negative Logits
planes
-0.75
omen
-0.67
aceutical
-0.62
alities
-0.59
OTHER
-0.56
adoes
-0.56
bender
-0.55
selves
-0.53
behavior
-0.53
hops
-0.53
POSITIVE LOGITS
logger
1.21
pains
1.18
fault
1.15
liberty
1.14
odds
1.06
ease
1.01
least
1.00
peace
0.94
yp
0.89
onement
0.84
Activations Density 0.084%