INDEX
Negative Logits
overlooking
-0.08
agement
-0.07
cott
-0.07
Transportation
-0.06
티
-0.06
okrat
-0.06
@test
-0.06
bumper
-0.06
vidět
-0.06
ände
-0.06
POSITIVE LOGITS
charms
0.08
_direction
0.06
line
0.06
names
0.06
zeit
0.06
Zodiac
0.06
نه
0.06
catal
0.06
lege
0.06
섭
0.06
Activations Density 0.045%