INDEX
Explanations
instances of the special character "ðŁ"
New Auto-Interp
Negative Logits
...
-0.17
Brexit
-0.15
...,
-0.14
,...
-0.13
SpaceX
-0.13
Tesla
-0.13
BYTES
-0.13
...↵↵
-0.12
achel
-0.12
criticised
-0.12
POSITIVE LOGITS
Ger
0.24
bullshit
0.21
fucking
0.20
fucked
0.19
fuck
0.19
Fucking
0.19
Ger
0.18
Fuck
0.18
Cro
0.17
FUCK
0.17
Activations Density 0.004%