INDEX
Explanations
correctness and alternative phrasing
New Auto-Interp
Negative Logits
belirl
1.30
connaissances
1.21
୯
1.16
conocimiento
1.14
знания
1.12
Ⅷ
1.10
ා
1.10
}}\|
1.09
знаний
1.09
conoscenze
1.09
POSITIVE LOGITS
would
1.06
seems
1.00
simpler
0.94
toast
0.94
seeming
0.94
implying
0.94
defies
0.93
nicer
0.93
louder
0.92
translates
0.92
Activations Density 0.122%