INDEX
Explanations
questions that express curiosity or seek clarification
New Auto-Interp
Negative Logits
ay
-0.18
culus
-0.16
.external
-0.16
iferay
-0.15
ypi
-0.15
aba
-0.15
æ©Ł
-0.15
loff
-0.15
ando
-0.15
est
-0.14
POSITIVE LOGITS
why
0.18
곡
0.15
why
0.15
qid
0.15
unately
0.15
tuz
0.14
Mentor
0.14
uns
0.14
rendered
0.14
unting
0.14
Activations Density 0.077%