INDEX
Explanations
proper nouns
specific nouns and terms that indicate entities or organizations
New Auto-Interp
Negative Logits
Lives
-1.07
Accountability
-1.04
Lines
-1.00
Blocks
-0.98
Lover
-0.95
Characters
-0.93
Clause
-0.93
Away
-0.91
Thing
-0.91
Ones
-0.91
POSITIVE LOGITS
tv
0.79
franc
0.76
democr
0.75
commons
0.74
ensis
0.73
fortunately
0.73
academy
0.72
arians
0.71
united
0.70
republican
0.69
Activations Density 0.767%