INDEX
Explanations
references to social or economic decline and its consequences
New Auto-Interp
Negative Logits
anza
-0.18
olen
-0.17
idth
-0.16
ouden
-0.15
swers
-0.15
efon
-0.15
achten
-0.14
Redistributions
-0.14
keley
-0.14
esub
-0.14
POSITIVE LOGITS
impl
0.35
collapse
0.32
unravel
0.31
collapsed
0.31
nose
0.31
spir
0.31
nos
0.31
unr
0.31
cr
0.30
crater
0.30
Activations Density 0.126%