INDEX
Explanations
references to broader social, economic, and communal themes
New Auto-Interp
Negative Logits
ylvania
-0.15
ror
-0.15
sel
-0.14
ople
-0.14
etc
-0.14
eca
-0.14
yan
-0.14
istrovstvÃŃ
-0.14
Shard
-0.13
uilder
-0.13
POSITIVE LOGITS
wider
0.24
broader
0.22
context
0.20
anging
0.20
society
0.18
larger
0.17
bigger
0.17
/general
0.17
-reaching
0.15
greater
0.15
Activations Density 0.033%