INDEX
Explanations
specific characters or symbols in the text
New Auto-Interp
Negative Logits
——
-0.16
fuck
-0.16
ÑijÑĢ
-0.15
FUCK
-0.15
shitty
-0.14
âĢIJ
-0.14
valueForKey
-0.14
fucking
-0.14
‘
-0.14
’
-0.14
POSITIVE LOGITS
Explorer
0.17
--
0.16
elsewhere
0.16
Else
0.15
Privacy
0.14
trimmed
0.14
enegro
0.14
jinak
0.14
Experts
0.14
Flores
0.14
Activations Density 0.003%