INDEX
Explanations
specific symbols or unusual characters in the text
New Auto-Interp
Negative Logits
â̝
-0.18
COVID
-0.17
ðŁĶ
-0.16
âĢį
-0.16
â̝
-0.15
â
-0.15
COVID
-0.15
ðŁ
-0.15
abbix
-0.15
ï¸ı
-0.15
POSITIVE LOGITS
fucking
0.34
fuck
0.29
fucks
0.29
sod
0.28
fucked
0.28
Fucking
0.28
shit
0.27
FUCK
0.27
Fuck
0.27
Fucked
0.26
Activations Density 0.029%