INDEX
Explanations
proper nouns or names of entities such as countries, organizations, or specific laws
highly impactful statements or events in news-related contexts
New Auto-Interp
Negative Logits
undecided
-0.80
blocker
-0.74
democrat
-0.73
persuasion
-0.71
guy
-0.70
mustache
-0.70
waterfall
-0.69
deflation
-0.69
grapp
-0.69
sails
-0.68
POSITIVE LOGITS
Interested
1.57
According
1.54
The
1.46
Writing
1.41
Sources
1.38
Speaking
1.34
Instead
1.32
Following
1.32
In
1.30
While
1.30
Activations Density 0.265%