INDEX
Explanations
proper nouns related to politics, specific people, and organizations
phrases related to governmental or political processes
New Auto-Interp
Negative Logits
.).
-0.70
!".
-0.66
]."
-0.66
".
-0.62
?".
-0.59
}.
-0.58
]).
-0.58
.</
-0.57
)).
-0.56
''.
-0.54
POSITIVE LOGITS
odore
0.56
ngth
0.50
romeda
0.48
swers
0.47
ividual
0.47
reenshots
0.46
imaru
0.45
divided
0.44
Aberdeen
0.44
arcity
0.44
Activations Density 2.640%