INDEX
Explanations
expressions of strong emotions or feelings
New Auto-Interp
Negative Logits
______
-0.16
[
-0.15
(
-0.14
borg
-0.14
prompt
-0.14
etc
-0.13
gin
-0.13
recieved
-0.13
ination
-0.13
nation
-0.13
POSITIVE LOGITS
fucking
0.22
fucked
0.21
fuck
0.20
fuck
0.20
asshole
0.19
fucks
0.18
vids
0.18
shitty
0.18
assh
0.17
nos
0.17
Activations Density 0.000%