INDEX
Explanations
expressions of personal opinions
the phrase "I would say" and its variations expressing personal opinions
New Auto-Interp
Negative Logits
uras
-0.71
hs
-0.69
Written
-0.65
vas
-0.65
CW
-0.64
icz
-0.63
HOU
-0.63
emen
-0.63
onut
-0.62
eatured
-0.62
POSITIVE LOGITS
goodbye
0.89
anecd
0.73
that
0.71
hello
0.70
è£ıè
0.67
congratulations
0.67
congr
0.62
95
0.61
Mub
0.60
yes
0.60
Activations Density 0.093%