INDEX
Explanations
phrases related to historical events and global political affairs, especially related to countries and leaders
New Auto-Interp
Negative Logits
naissance
-1.05
vernment
-1.03
mileage
-1.02
Reviewer
-0.96
ativity
-0.94
ously
-0.93
ISON
-0.90
slic
-0.89
laundry
-0.89
nces
-0.88
POSITIVE LOGITS
weet
1.71
arnaev
1.69
hirt
1.48
onic
1.45
omething
1.38
avorite
1.34
arah
1.33
ugar
1.31
ierra
1.30
omp
1.28
Activations Density 1.303%