INDEX
Explanations
references to personal privilege and societal issues
New Auto-Interp
Negative Logits
‘’
-0.66
již
-0.63
ıdır
-0.62
noOf
-0.62
,
-0.61
¨
-0.60
SourceChecksum
-0.59
didalam
-0.59
」
-0.59
","
-0.58
POSITIVE LOGITS
shitty
1.15
goddamn
1.13
fucking
1.10
fuck
1.05
fuck
1.02
shit
1.02
fucked
1.02
FUCKING
1.01
fucking
1.01
fuckin
1.00
Activations Density 1.136%