INDEX
Explanations
phrases indicating degrees of change or modification
New Auto-Interp
Negative Logits
DragonMagazine
-0.77
aring
-0.77
IRO
-0.68
urated
-0.67
psons
-0.64
Paste
-0.63
CRIPTION
-0.63
Ò
-0.62
rets
-0.61
ovic
-0.60
POSITIVE LOGITS
bit
0.81
stra
0.81
heartedly
0.79
longer
0.76
differently
0.75
stead
0.70
sooner
0.69
indist
0.68
ighter
0.67
farther
0.67
Activations Density 0.100%