INDEX
Negative Logits
disp
-0.10
oku
-0.09
Annex
-0.08
ald
-0.08
merc
-0.08
asic
-0.08
ausal
-0.08
coder
-0.08
chan
-0.08
Agu
-0.08
POSITIVE LOGITS
answer
0.31
answer
0.23
çŃĶæ¡Ī
0.22
answers
0.18
.answer
0.17
Answer
0.17
\tanswer
0.17
(answer
0.16
_answer
0.16
ans
0.16
Activations Density 0.030%