INDEX
Explanations
acknowledging informal or confrontational inputs
New Auto-Interp
Negative Logits
"!
0.52
편리
0.50
wonderful
0.48
'!
0.47
exciting
0.45
Exc
0.45
!।
0.44
”!
0.44
Exc
0.43
답니다
0.43
POSITIVE LOGITS
idk
1.02
tbh
0.96
lmao
0.94
shit
0.91
honestly
0.88
dude
0.88
fucked
0.85
weird
0.83
shitty
0.82
Fuck
0.82
Activations Density 0.010%