INDEX
Explanations
phrases expressing positive or negative evaluations and judgments
New Auto-Interp
Negative Logits
\'
-0.76
onut
-0.66
ench
-0.65
ohyd
-0.64
kee
-0.62
acca
-0.62
ilian
-0.61
ivalry
-0.59
haw
-0.58
cies
-0.58
POSITIVE LOGITS
someday
0.83
tomorrow
0.81
if
0.78
Osw
0.69
Wouldn
0.68
sooner
0.67
wiser
0.67
morrow
0.66
feas
0.65
forever
0.65
Activations Density 0.191%