INDEX
Negative Logits
wini
-0.08
¶
-0.08
Ian
-0.07
rate
-0.07
Ł
-0.07
gete
-0.07
regen
-0.07
revision
-0.07
جعل
-0.07
Byrne
-0.07
POSITIVE LOGITS
questionable
0.13
risky
0.13
tempting
0.12
pelig
0.12
temptation
0.12
tempted
0.12
tempt
0.11
誘惑
0.11
unsafe
0.10
dubious
0.10
Activations Density 0.075%