INDEX
Explanations
phrases indicating size, quantity, or degree
assertions about performance relative to expectations
New Auto-Interp
Negative Logits
Britann
-0.64
leted
-0.61
CRC
-0.59
irresistible
-0.57
legend
-0.56
DRAGON
-0.55
gorgeous
-0.55
legends
-0.55
ROCK
-0.55
forged
-0.54
POSITIVE LOGITS
anymore
1.32
nor
1.22
nor
1.06
whatsoever
0.93
anywhere
0.89
enough
0.88
unless
0.82
yet
0.81
anything
0.80
necessarily
0.79
Activations Density 0.792%