INDEX
Negative Logits
junction
-0.08
isasi
-0.07
importantly
-0.07
apparently
-0.07
uniquely
-0.07
prisoner
-0.07
morality
-0.07
bhar
-0.07
'all
-0.07
desemb
-0.07
POSITIVE LOGITS
folos
0.08
Highlight
0.08
0.08
wykorzyst
0.08
elder
0.07
RTX
0.07
Motorsport
0.07
türk
0.07
Diverse
0.07
çeşit
0.07
Activations Density 0.002%