INDEX
Explanations
mentions of actions or events attributed to specific subjects
New Auto-Interp
Negative Logits
nelly
-0.18
loud
-0.15
ively
-0.14
маÑĤи
-0.13
KeySpec
-0.13
enan
-0.13
GRESS
-0.13
amiliar
-0.13
alach
-0.13
empo
-0.13
POSITIVE LOGITS
-products
0.23
means
0.23
products
0.20
gone
0.19
virtue
0.19
-election
0.18
laws
0.17
chance
0.17
/on
0.17
-pass
0.16
Activations Density 0.213%