INDEX
Explanations
conversational phrases and informal speech patterns
New Auto-Interp
Negative Logits
avr
-0.15
iphy
-0.15
åı¸
-0.14
COVER
-0.14
(
-0.14
afs
-0.13
atown
-0.13
ostel
-0.13
apur
-0.13
omens
-0.13
POSITIVE LOGITS
fuck
0.26
man
0.25
fucked
0.23
fucks
0.23
shit
0.22
fuck
0.21
Fuck
0.20
cats
0.19
cat
0.19
Fuck
0.19
Activations Density 0.001%