INDEX
Explanations
references to fictional characters or elements from various stories or franchises
New Auto-Interp
Negative Logits
upal
-0.16
Sponge
-0.15
Gün
-0.15
UF
-0.14
paddle
-0.14
Shut
-0.14
721
-0.14
Programming
-0.14
sponge
-0.14
padd
-0.14
POSITIVE LOGITS
Witch
0.23
witch
0.17
witch
0.17
Baths
0.16
Vys
0.15
odash
0.15
Ger
0.15
麻
0.15
.sz
0.15
Warsaw
0.15
Activations Density 0.004%