INDEX
Negative Logits
Retrie
-0.07
arena
-0.07
万
-0.07
pope
-0.07
geomet
-0.07
smashing
-0.07
"":↵
-0.07
crap
-0.06
worsening
-0.06
берем
-0.06
POSITIVE LOGITS
instructions
0.10
instructed
0.10
Instructions
0.10
instruction
0.09
Instructions
0.08
kInstruction
0.08
instructors
0.07
دستور
0.07
assign
0.07
Instructor
0.07
Activations Density 0.021%