INDEX
Explanations
proper nouns identifying individuals in news reports
phrases that identify individuals or groups
New Auto-Interp
Negative Logits
itiveness
-0.78
rones
-0.77
OTAL
-0.74
Duration
-0.70
ITH
-0.69
raq
-0.69
Length
-0.69
amon
-0.68
Capture
-0.66
imize
-0.66
POSITIVE LOGITS
belonging
0.96
follows
0.92
pired
0.80
well
0.78
favoring
0.78
pires
0.78
opposed
0.76
having
0.76
conscientious
0.76
being
0.70
Activations Density 0.103%