INDEX
Explanations
phrases indicating significant improvements or advancements
New Auto-Interp
Negative Logits
essee
-0.88
Interstitial
-0.76
anyahu
-0.71
entary
-0.66
urally
-0.65
Beware
-0.62
zza
-0.60
Sins
-0.60
XM
-0.60
gel
-0.60
POSITIVE LOGITS
frog
1.23
leaps
0.98
olate
0.85
forward
0.80
hemer
0.79
roads
0.78
Forward
0.76
ering
0.75
olicy
0.74
arts
0.74
Activations Density 0.017%