INDEX
Explanations
references to fur or furry-related content
New Auto-Interp
Negative Logits
aal
-0.17
es
-0.16
egasus
-0.16
lifting
-0.16
odia
-0.15
perse
-0.15
chk
-0.15
emit
-0.15
century
-0.15
Century
-0.15
POSITIVE LOGITS
thest
0.27
iously
0.23
ioso
0.23
iosa
0.22
riers
0.22
rier
0.20
thers
0.19
phy
0.19
fur
0.18
uristic
0.18
Activations Density 0.005%