INDEX
Explanations
specific references to cultural products and artistic expressions
New Auto-Interp
Negative Logits
mandate
-0.15
itta
-0.15
że
-0.14
fund
-0.14
jav
-0.14
tap
-0.14
avia
-0.13
badly
-0.13
ring
-0.13
iem
-0.13
POSITIVE LOGITS
.byte
0.15
zag
0.15
hoo
0.14
ubar
0.14
noqa
0.14
pto
0.14
863
0.14
atan
0.13
ISC
0.13
IAM
0.13
Activations Density 0.300%