INDEX
Explanations
phrases expressing comparisons or preferences
phrases indicating improvement or betterment
New Auto-Interp
Negative Logits
igham
-0.82
quin
-0.68
atan
-0.61
unpop
-0.59
scissors
-0.56
mod
-0.56
Cont
-0.55
Command
-0.55
itars
-0.55
ADD
-0.55
POSITIVE LOGITS
than
1.17
than
1.15
Than
1.05
chances
0.77
outcomes
0.69
aroo
0.66
spelling
0.62
Recon
0.62
oldemort
0.61
iate
0.61
Activations Density 0.155%