INDEX
Negative Logits
bom
-0.06
Pax
-0.06
nex
-0.06
bum
-0.06
external
-0.06
Tracking
-0.06
Mov
-0.06
precious
-0.06
*_
-0.06
hor
-0.05
POSITIVE LOGITS
difficulty
0.13
difficulties
0.10
trouble
0.07
uteč
0.07
하지
0.07
IDI
0.07
har
0.07
utherford
0.07
iculty
0.07
Griffith
0.07
Activations Density 0.007%