INDEX
Explanations
discussions related to political and economic systems
New Auto-Interp
Negative Logits
orts
-0.75
icable
-0.65
ousel
-0.63
commun
-0.62
eks
-0.62
iple
-0.60
ruck
-0.60
oyd
-0.60
level
-0.59
trainer
-0.59
POSITIVE LOGITS
_-
1.23
perhaps
1.17
especially
1.17
————
1.15
particularly
1.12
————————
1.09
including
1.07
something
1.05
albeit
1.03
that
1.01
Activations Density 1.900%