INDEX
Explanations
references to politics and controversies, especially related to specific events or individuals
New Auto-Interp
Negative Logits
exception
-0.63
Portuguese
-0.60
ASED
-0.58
citation
-0.57
adjunct
-0.57
MENTS
-0.57
submission
-0.57
reinforcement
-0.57
aneously
-0.56
Sweeney
-0.56
POSITIVE LOGITS
mith
1.78
hift
1.69
pace
1.66
ilver
1.65
chool
1.61
creen
1.61
peed
1.60
hip
1.60
omething
1.58
pring
1.57
Activations Density 2.036%