INDEX
Explanations
mentions of reference points and citations in academic or technical texts
New Auto-Interp
Negative Logits
erna
-0.18
ern
-0.18
ish
-0.15
de
-0.15
оз
-0.14
ông
-0.14
ÎĶια
-0.14
Ì£
-0.14
trad
-0.14
Mell
-0.14
POSITIVE LOGITS
izes
0.20
/Instruction
0.17
andum
0.17
NÄĽm
0.16
ién
0.16
coni
0.16
ourcem
0.16
/reference
0.15
resher
0.15
peating
0.15
Activations Density 0.017%