INDEX
Explanations
gambling, alcohol, sex, or crime related completions
New Auto-Interp
Negative Logits
is
0.46
Weierstrass
0.41
underwhelming
0.40
kilogram
0.39
-{\0.39
cyclohex
0.39
):
0.38
uppercase
0.38
has
0.38
arugula
0.37
POSITIVE LOGITS
”
0.93
”,
0.80
"
0.79
’’
0.78
」
0.77
”,
0.75
”、
0.74
”،
0.74
”—
0.72
rdquo
0.70
Activations Density 0.293%