INDEX
Negative Logits
("&-0.07
Math
-0.07
cand
-0.06
.fasta
-0.06
자세
-0.06
english
-0.06
minutos
-0.06
ad
-0.06
empathy
-0.06
didn
-0.06
POSITIVE LOGITS
strike
0.15
Strike
0.11
Strike
0.11
strikes
0.11
struck
0.10
striking
0.10
strike
0.09
_strike
0.09
,↵↵
0.08
ke
0.08
Activations Density 0.008%