INDEX
Explanations
phrases expressing disapproval or surprise
expressions of disbelief or disappointment regarding situations or events
New Auto-Interp
Negative Logits
ãĥīãĥ©
-0.81
aukee
-0.78
orage
-0.77
ãĥ¼ãĤ¯
-0.71
iaz
-0.70
tails
-0.70
iHUD
-0.70
ä¹
-0.68
æĿ
-0.67
vc
-0.67
POSITIVE LOGITS
someone
1.08
somebody
0.98
anyone
0.93
they
0.87
nobody
0.85
we
0.82
someone
0.81
anybody
0.81
people
0.74
THEY
0.73
Activations Density 0.107%