INDEX
Explanations
is followed by explanation or purpose
New Auto-Interp
Negative Logits
0.35
etc
0.34
Such
0.32
오늘은
0.31
etc
0.31
0.31
이는
0.31
$\
0.30
Which
0.29
或
0.29
POSITIVE LOGITS
twofold
0.85
threefold
0.75
supposed
0.57
simply
0.54
probably
0.52
called
0.51
comprised
0.50
akin
0.50
actually
0.49
centered
0.47
Activations Density 0.171%