INDEX
Explanations
names and locations associated with specific events or narratives
New Auto-Interp
Negative Logits
(“
-0.17
ï¼Ī
-0.16
.Logic
-0.15
Farms
-0.15
ãĤ¥
-0.14
ũi
-0.14
,},↵
-0.14
ãĥĥãĥĹ
-0.14
íĸī
-0.14
!↵↵↵↵
-0.14
POSITIVE LOGITS
)
0.31
),
0.27
}
0.22
]
0.20
);
0.19
):
0.19
).
0.18
)/
0.18
},
0.18
],
0.18
Activations Density 0.158%