INDEX
Explanations
phrases related to commands or instructions
phrases that indicate warnings or prohibitions
New Auto-Interp
Negative Logits
effic
-0.62
basics
-0.60
exemplary
-0.57
wonderfully
-0.56
rocal
-0.56
admirable
-0.55
excellent
-0.55
awesome
-0.54
unparalleled
-0.54
amazing
-0.54
POSITIVE LOGITS
anymore
1.88
unless
1.79
unless
1.66
nor
1.41
lest
1.36
until
1.33
until
1.33
because
1.28
anytime
1.26
whatsoever
1.21
Activations Density 0.738%