INDEX
Negative Logits
starring
-0.08
Wellness
-0.08
ละคร
-0.08
Hollywood
-0.07
_FUN
-0.07
住宿
-0.07
Gover
-0.07
_house
-0.07
ाहित
-0.07
بالط
-0.07
POSITIVE LOGITS
robustness
0.15
Robust
0.13
resilient
0.12
insensitive
0.12
robust
0.12
gegenüber
0.11
resilience
0.11
withstand
0.11
against
0.11
against
0.11
Activations Density 0.019%