INDEX
Explanations
questions and phrases indicating inquiry or curiosity
New Auto-Interp
Negative Logits
what
-0.19
.what
-0.15
frank
-0.15
what
-0.14
476
-0.14
zb
-0.14
ator
-0.14
ovah
-0.13
uz
-0.13
åı
-0.13
POSITIVE LOGITS
-count
0.17
onec
0.15
iyat
0.15
lesson
0.14
place
0.14
/place
0.14
motiv
0.14
meaning
0.14
ä¸ģ
0.14
aile
0.14
Activations Density 0.109%