INDEX
Explanations
pauses and trailing punctuation
New Auto-Interp
Negative Logits
awesome
0.78
めっちゃ
0.77
啥
0.77
dudes
0.72
dude
0.72
banget
0.60
Awesome
0.57
aka
0.55
awesome
0.55
weird
0.54
POSITIVE LOGITS
-”
0.86
…”
0.82
—”
0.78
...”
0.73
…"
0.67
…
0.66
?”
0.66
..”
0.64
--’
0.64
--"
0.62
Activations Density 0.334%