INDEX
Explanations
references to media commentary or cultural critique
Followed by punctuation
goddamn fucking fuck
New Auto-Interp
Negative Logits
]-->
-0.80
"!
-0.67
!
-0.66
”!
-0.66
'!
-0.62
gjelder
-0.61
!"
-0.61
Statics
-0.61
CreateTagHelper
-0.60
AssemblyVersion
-0.59
POSITIVE LOGITS
fucking
1.10
goddamn
1.02
FUCKING
0.94
fucking
0.91
fuckin
0.90
fucked
0.89
fuck
0.84
Fucking
0.84
fucks
0.83
FUCK
0.83
Activations Density 0.343%