INDEX
Negative Logits
Bora
-0.08
residual
-0.08
Residual
-0.08
Byzant
-0.08
nasled
-0.07
bespoke
-0.07
neglig
-0.07
boj
-0.07
matag
-0.07
paub
-0.07
POSITIVE LOGITS
evidence
0.10
backing
0.10
തെള
0.10
evid
0.09
证
0.09
証
0.09
Evidence
0.09
Evidence
0.09
assistant
0.09
preuves
0.09
Activations Density 0.019%