INDEX
Explanations
words related to actions involving physical movements
the presence of indefinite articles and descriptors in contexts depicting various scenarios
New Auto-Interp
Negative Logits
annually
-0.76
iments
-0.74
endeavors
-0.70
Own
-0.69
views
-0.66
Hon
-0.66
Merit
-0.66
achu
-0.65
Includes
-0.65
America
-0.65
POSITIVE LOGITS
couple
1.10
handful
1.07
few
1.05
nearby
1.00
bunch
0.96
lot
0.95
flurry
0.93
flashback
0.93
friend
0.88
bystand
0.86
Activations Density 0.472%