INDEX
Explanations
mentions of political figures/entities
references to specific people, places, or events
New Auto-Interp
Negative Logits
)."
-0.45
]."
-0.45
).[
-0.42
)).
-0.41
".[
-0.40
.'"
-0.38
.''.
-0.38
]).
-0.37
catentry
-0.37
'."
-0.36
POSITIVE LOGITS
ogie
0.41
ragon
0.34
haircut
0.34
hen
0.33
cruising
0.32
lycer
0.32
ucker
0.32
cients
0.32
apeshifter
0.31
ahime
0.30
Activations Density 4.668%