INDEX
Explanations
names of specific individuals, likely related to politics or public figures
New Auto-Interp
Negative Logits
utenant
-0.76
olin
-0.74
holes
-0.73
lehem
-0.73
lator
-0.71
anguage
-0.70
glomer
-0.68
virtual
-0.67
interstitial
-0.66
stress
-0.66
POSITIVE LOGITS
refusal
1.35
insistence
1.32
efforts
1.28
plight
1.25
inability
1.23
actions
1.22
unwillingness
1.21
assertion
1.21
antics
1.20
involvement
1.19
Activations Density 0.234%