INDEX
Explanations
names or identities of individuals in news articles
phrases that indicate possession or reference to individuals or entities
New Auto-Interp
Negative Logits
nodd
-0.69
leep
-0.68
acs
-0.68
alright
-0.66
inis
-0.64
dayName
-0.62
misunder
-0.60
anytime
-0.60
arrang
-0.60
ãĤ¦ãĤ¹
-0.59
POSITIVE LOGITS
those
0.89
his
0.86
its
0.86
the
0.78
their
0.77
three
0.73
them
0.73
several
0.72
Britain
0.70
four
0.70
Activations Density 0.096%