INDEX
Explanations
proper nouns and names, particularly related to news and politics
New Auto-Interp
Negative Logits
Els
-0.74
Load
-0.72
oard
-0.66
Agric
-0.62
ories
-0.61
Ire
-0.61
Dominion
-0.60
Wem
-0.60
Georg
-0.60
Region
-0.59
POSITIVE LOGITS
Jr
1.39
Sr
1.11
III
1.06
aka
0.93
famously
0.93
agher
0.89
Jr
0.86
's
0.86
himself
0.82
ovich
0.82
Activations Density 2.050%