INDEX
Explanations
references to historical leaders and their roles in empire-building contexts
New Auto-Interp
Negative Logits
strup
-0.18
ems
-0.15
imli
-0.15
pbs
-0.15
âĢĮÛĮ
-0.14
eba
-0.14
Jeg
-0.14
stract
-0.14
BITS
-0.13
Twist
-0.13
POSITIVE LOGITS
[c
0.24
Template
0.23
[a
0.22
,[
0.22
Template
0.21
[
0.21
:[
0.21
.[
0.19
,[
0.18
[n
0.18
Activations Density 0.117%