INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
leyen
-0.15
behavioural
-0.14
Haram
-0.14
favourable
-0.14
vil
-0.13
Honour
-0.13
Seeder
-0.13
lei
-0.13
patial
-0.13
ãĥ¼
-0.13
POSITIVE LOGITS
fucking
0.25
fuck
0.25
fucks
0.23
fucked
0.23
fuck
0.23
Fucking
0.21
FUCK
0.20
Fuck
0.20
Fuck
0.19
cunt
0.17
Activations Density 0.000%
No Known Activations
This feature has no known activations.