INDEX
Explanations
fractions written in the format of X/Y, with the numerator and denominator being single-digit numbers
New Auto-Interp
Negative Logits
Gret
-0.68
bang
-0.67
Vul
-0.66
Merry
-0.64
hon
-0.64
obscene
-0.63
Chop
-0.63
Gavin
-0.63
Mir
-0.62
farewell
-0.62
POSITIVE LOGITS
3
0.94
2
0.91
4
0.87
lvl
0.86
week
0.85
DAY
0.84
oct
0.84
division
0.80
5
0.77
OTT
0.77
Activations Density 0.020%