INDEX
Negative Logits
Clifford
-0.07
(prog
-0.07
ưng
-0.07
*.
-0.07
Four
-0.06
Kem
-0.06
障
-0.06
militar
-0.06
Flow
-0.06
liberation
-0.06
POSITIVE LOGITS
honest
0.19
honesty
0.14
Honest
0.11
honestly
0.09
truthful
0.08
dishonest
0.08
onest
0.07
Lös
0.07
Honestly
0.07
HK
0.07
Activations Density 0.006%