INDEX
Explanations
proper nouns, specifically names of individuals in news reports or incidents
instances of the preposition "of"
New Auto-Interp
Negative Logits
functioning
-0.77
attribute
-0.72
disse
-0.72
evaluates
-0.68
derog
-0.68
dictate
-0.68
explan
-0.66
vulner
-0.66
pard
-0.66
disag
-0.65
POSITIVE LOGITS
Anaheim
1.00
Lancaster
0.98
Queens
0.98
Stamford
0.97
Bethlehem
0.95
Wilmington
0.94
Calgary
0.94
Rochester
0.94
Omaha
0.93
Auckland
0.93
Activations Density 0.070%