INDEX
Explanations
phrases that indicate news reporting or information sources
New Auto-Interp
Negative Logits
ekim
-0.18
iez
-0.16
haft
-0.15
omu
-0.15
ики
-0.15
idd
-0.14
Wik
-0.14
/commons
-0.14
lue
-0.14
bst
-0.14
POSITIVE LOGITS
ako
0.16
contract
0.14
ims
0.14
specialize
0.14
insi
0.13
Cellular
0.13
Cla
0.13
OUCH
0.13
irling
0.13
Asc
0.13
Activations Density 0.010%