INDEX
Explanations
uniquely formatted or encoded characters and their associated values
New Auto-Interp
Negative Logits
..."↵
-0.14
Âĸ
-0.14
Âł↵
-0.14
ÂĶ
-0.14
â̦
-0.14
youngsters
-0.13
â̦↵
-0.13
}`
-0.12
zbo
-0.12
fractional
-0.12
POSITIVE LOGITS
FML
0.40
FML
0.31
fucking
0.31
FUCK
0.29
fucked
0.28
fuck
0.27
Fuck
0.27
Fucking
0.27
Fuck
0.25
cunt
0.24
Activations Density 0.007%