INDEX
Explanations
phrases related to questions or questioning
New Auto-Interp
Negative Logits
icine
-0.16
uation
-0.16
ummings
-0.16
ication
-0.15
kker
-0.15
temp
-0.15
.tell
-0.14
usion
-0.14
ration
-0.14
orta
-0.14
POSITIVE LOGITS
naires
0.34
naire
0.31
aire
0.24
stellung
0.19
stell
0.17
aires
0.17
stown
0.16
-answer
0.15
rcode
0.15
ÑĭÑģ
0.15
Activations Density 0.059%