INDEX
Explanations
phrases expressing surprise or disbelief
New Auto-Interp
Negative Logits
pour
-0.82
ahime
-0.77
antioxid
-0.73
iant
-0.65
redients
-0.65
åĤ
-0.64
PI
-0.63
achev
-0.63
uliffe
-0.62
imo
-0.61
POSITIVE LOGITS
theless
1.67
bothered
1.09
dreamed
1.00
EVER
0.92
ceases
0.85
heard
0.85
doubted
0.82
dime
0.81
bother
0.81
imagined
0.81
Activations Density 0.292%