INDEX
Explanations
yes/no questions and related inquiry patterns
New Auto-Interp
Negative Logits
hof
-0.16
sonst
-0.14
lette
-0.14
айд
-0.14
ikk
-0.14
isten
-0.13
pond
-0.13
ensch
-0.13
agt
-0.13
atable
-0.13
POSITIVE LOGITS
answer
0.21
Answer
0.16
Answer
0.16
uner
0.16
çŃĶ
0.15
answered
0.15
деÑĤ
0.15
ningen
0.15
swer
0.15
unas
0.15
Activations Density 0.042%