INDEX
Explanations
phrases related to expressing opinions or beliefs
references to personal opinions or viewpoints
New Auto-Interp
Negative Logits
Mamm
-0.77
trap
-0.73
ursed
-0.65
Sequ
-0.62
Maid
-0.62
Dent
-0.60
Reflex
-0.60
amaz
-0.60
moon
-0.58
ALL
-0.58
POSITIVE LOGITS
yip
0.96
opinions
0.91
views
0.91
viewpoints
0.88
hops
0.86
guiActiveUn
0.80
beliefs
0.79
rences
0.78
chool
0.78
reprene
0.77
Activations Density 0.026%