INDEX
Negative Logits
iram
-0.18
iron
-0.16
åĪĩãĤĬ
-0.15
sig
-0.15
eder
-0.15
elling
-0.15
ires
-0.15
оÑģÑĤаÑĤ
-0.15
ooth
-0.14
ez
-0.14
POSITIVE LOGITS
ence
0.22
ent
0.21
ently
0.19
leta
0.19
ENCE
0.19
aceous
0.17
ins
0.17
viol
0.17
-viol
0.17
ations
0.16
Activations Density 0.005%