INDEX
Explanations
phrases related to discussions, analysis, and questions
articles and quantifiers related to descriptions or classifications
New Auto-Interp
Negative Logits
agues
-0.80
adj
-0.76
Events
-0.74
ees
-0.72
abilities
-0.71
enemy
-0.71
Orig
-0.70
Experts
-0.70
effects
-0.69
osponsors
-0.69
POSITIVE LOGITS
combination
1.19
handshake
1.12
mixture
1.08
simple
1.08
straightforward
1.05
bunch
1.04
declaration
1.00
cknowled
0.99
bang
0.99
willingness
0.98
Activations Density 0.268%