INDEX
Explanations
names of individuals and entities in news articles
proper nouns related to people and their affiliations
New Auto-Interp
Negative Logits
likes
-0.64
---------
-0.64
demands
-0.59
DOES
-0.58
bends
-0.58
needs
-0.58
persists
-0.57
wakes
-0.57
conqu
-0.57
doesnt
-0.56
POSITIVE LOGITS
respectively
1.70
jointly
1.17
were
1.15
discuss
1.14
are
1.11
collaborate
1.05
were
1.05
both
1.03
collide
1.01
both
0.99
Activations Density 0.373%