INDEX
Explanations
words related to political figures or entities
references to the term "CM" or its variations in a variety of contexts
New Auto-Interp
Negative Logits
lihood
-0.87
sidx
-0.78
gered
-0.75
enced
-0.75
ggies
-0.73
rets
-0.71
*/(
-0.71
reys
-0.70
ibility
-0.70
gew
-0.69
POSITIVE LOGITS
Punk
0.86
ODE
0.81
ageddon
0.80
ancel
0.79
illon
0.79
ahon
0.77
trop
0.71
apper
0.71
ichael
0.67
ography
0.66
Activations Density 0.022%