INDEX
Explanations
punctuation marks and sentence endings
End of sentences
New Auto-Interp
Negative Logits
muß
-0.86
・・・・・
-0.85
läßt
-0.82
・・・・
-0.81
müßte
-0.69
-0.66
ㄋ
-0.65
.......
-0.65
unsurpassed
-0.65
mußte
-0.64
POSITIVE LOGITS
Idk
1.08
shitty
1.05
idk
1.05
fucked
1.03
fucking
1.03
idk
1.02
tbh
1.02
goddamn
0.99
FUCKING
0.98
weirdly
0.98
Activations Density 0.312%