INDEX
Explanations
references to political and economic terms, actions, and figures
New Auto-Interp
Negative Logits
theless
-0.64
Applicant
-0.59
scrut
-0.53
oneself
-0.53
\\\\\\\\
-0.53
Topic
-0.51
Dot
-0.50
Naz
-0.50
WARD
-0.49
ãĥ¼ãĥĨ
-0.49
POSITIVE LOGITS
irs
0.72
ocratic
0.64
aughters
0.64
ultimate
0.63
own
0.63
uga
0.62
eenth
0.62
cient
0.62
ificant
0.61
counterparts
0.61
Activations Density 10.537%