INDEX
Explanations
questions starting with "Does"
interrogative phrases that pose questions
New Auto-Interp
Negative Logits
Islands
-0.82
agonists
-0.80
bags
-0.75
boards
-0.72
arters
-0.71
bush
-0.71
hog
-0.70
cards
-0.68
rets
-0.67
boats
-0.67
POSITIVE LOGITS
omething
1.03
berra
1.01
hift
0.89
olation
0.85
paces
0.84
pace
0.83
onga
0.83
notation
0.78
hip
0.75
cale
0.75
Activations Density 0.045%