INDEX
Explanations
references to fictional characters and creative works
New Auto-Interp
Negative Logits
upal
-0.18
paddle
-0.17
padd
-0.16
.uf
-0.15
Flake
-0.14
Gün
-0.14
ubby
-0.14
tn
-0.14
tank
-0.14
Shut
-0.14
POSITIVE LOGITS
Witch
0.24
Ger
0.19
witch
0.18
Ger
0.18
aska
0.17
Sabb
0.17
witch
0.17
Netflix
0.16
Polish
0.16
麻
0.16
Activations Density 0.007%