INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
elper
-0.19
emme
-0.15
--
-0.14
--;
-0.14
)--
-0.14
(`
-0.14
-&
-0.14
--[
-0.14
ostel
-0.14
'[
-0.14
POSITIVE LOGITS
fucking
0.23
fuck
0.17
Fucking
0.17
Fuck
0.17
FUCK
0.17
bullshit
0.17
Fuck
0.16
shit
0.16
fucked
0.16
Wonder
0.16
Activations Density 0.000%
No Known Activations
This feature has no known activations.