INDEX
Explanations
prepositional phrases indicating relationships or connections
New Auto-Interp
Negative Logits
incial
-0.89
afia
-0.75
idences
-0.73
ederal
-0.71
cci
-0.70
merce
-0.70
ossibility
-0.70
anguage
-0.70
aren
-0.69
ignt
-0.69
POSITIVE LOGITS
nowhere
0.71
Rowe
0.67
mole
0.65
Gillespie
0.64
Bris
0.63
Buster
0.61
existence
0.60
him
0.59
everything
0.59
them
0.58
Activations Density 0.025%