INDEX
Explanations
phrases related to expressing personal viewpoints or beliefs
various forms of the word "opinion."
New Auto-Interp
Negative Logits
tons
-0.75
bridge
-0.74
trap
-0.72
mind
-0.70
friends
-0.69
bang
-0.68
mers
-0.67
enegger
-0.67
mable
-0.67
ammers
-0.66
POSITIVE LOGITS
ated
1.07
polls
0.96
piece
0.93
atorial
0.93
opinions
0.88
opinion
0.86
largeDownload
0.81
ately
0.79
polling
0.79
atively
0.79
Activations Density 0.037%