INDEX
Explanations
phrases related to expressing personal viewpoints or beliefs
mentions of personal opinions
New Auto-Interp
Negative Logits
Gutenberg
-0.78
gm
-0.68
Roose
-0.67
ammers
-0.66
artifacts
-0.66
ammy
-0.65
Mamm
-0.64
artney
-0.64
estones
-0.63
Danger
-0.63
POSITIVE LOGITS
opinion
0.90
opinions
0.87
largeDownload
0.82
piece
0.76
polls
0.75
opin
0.74
atively
0.73
bias
0.72
eering
0.72
paralysis
0.70
Activations Density 0.020%