INDEX
Explanations
phrases related to opinions or beliefs
phrases indicating opinions, beliefs, or claims about various subjects
New Auto-Interp
Negative Logits
ãĥ¯
-0.68
-+-+-+-+
-0.59
tesy
-0.59
deed
-0.57
ainment
-0.56
ammy
-0.55
Sieg
-0.55
liaison
-0.54
Himself
-0.54
ou
-0.53
POSITIVE LOGITS
underest
0.77
underestimate
0.76
misconceptions
0.71
skepticism
0.67
mares
0.66
explanations
0.64
incorrectly
0.64
conspiracy
0.63
alike
0.63
myths
0.63
Activations Density 0.450%