INDEX
Explanations
references to historical political terms and concepts
New Auto-Interp
Negative Logits
####
-0.24
###
-0.22
#####
-0.22
##
-0.20
###↵
-0.16
'**
-0.15
_##
-0.15
**
-0.15
#
-0.15
\`
-0.15
POSITIVE LOGITS
âĨij
0.33
^
0.32
Template
0.26
Wik
0.26
^
0.25
âĨij
0.25
Template
0.25
^↵
0.24
:^
0.23
.^
0.23
Activations Density 0.019%