INDEX
Explanations
proper nouns and names related to politics, government, and international affairs
possessive forms or references to entities and their characteristics
New Auto-Interp
Negative Logits
ournals
-0.83
cloth
-0.72
agues
-0.70
Compare
-0.69
Compare
-0.68
lin
-0.67
lator
-0.67
apart
-0.66
roup
-0.66
cells
-0.65
POSITIVE LOGITS
insistence
1.30
refusal
1.28
inability
1.28
unwillingness
1.24
involvement
1.23
willingness
1.20
efforts
1.18
penchant
1.17
reluctance
1.16
assertion
1.14
Activations Density 0.288%