INDEX
Explanations
mentions of names of people and organizations
New Auto-Interp
Negative Logits
thood
-0.90
eno
-0.85
usa
-0.84
ceive
-0.82
beforehand
-0.80
besides
-0.79
Communism
-0.78
udo
-0.74
iffe
-0.74
veland
-0.74
POSITIVE LOGITS
latter
1.56
largest
1.40
same
1.38
fastest
1.38
biggest
1.37
hardest
1.36
smallest
1.36
longest
1.35
oldest
1.34
earliest
1.32
Activations Density 4.636%