INDEX
Explanations
"I know, I know" admissions
New Auto-Interp
Negative Logits
although
0.52
တော့
0.42
Although
0.41
although
0.40
虽然
0.40
certainly
0.39
experience
0.38
虽然
0.38
roffenen
0.38
不管
0.38
POSITIVE LOGITS
Ideally
0.45
ぉ
0.45
oooo
0.44
!!!
0.43
sorry
0.43
!!!!
0.42
Yea
0.41
!!!"
0.40
Leaf
0.40
ironic
0.40
Activations Density 0.007%