INDEX
Explanations
mentions of "RD" followed by a number
New Auto-Interp
Negative Logits
aarrggbb
-1.09
Bever
-0.76
ModelRenderer
-0.74
lyre
-0.71
Snowy
-0.71
Rij
-0.70
upaten
-0.69
Neuk
-0.68
😍😍
-0.68
SNR
-0.67
POSITIVE LOGITS
RD
1.48
RD
1.18
rd
1.08
rd
0.94
findpost
0.92
Carden
0.85
Rd
0.79
لينكات
0.77
ptonshire
0.76
Rd
0.75
Activations Density 0.010%