INDEX
Explanations
long sequences of dashes or underscores
New Auto-Interp
Negative Logits
tocin
-0.78
fees
-0.76
gok
-0.76
Ulysses
-0.75
########.
-0.74
Hancock
-0.73
Waugh
-0.73
Suzy
-0.73
Skyl
-0.73
θρώ
-0.72
POSITIVE LOGITS
1.52
1.21
1.07
="#"><
0.91
0.91
DPM
0.88
Lili
0.82
rd
0.82
ABCD
0.82
Lili
0.81
Activations Density 0.046%