INDEX
Explanations
interrogative language
New Auto-Interp
Negative Logits
laure
-0.77
çīĪ
-0.66
©¶æ
-0.63
tyard
-0.60
rique
-0.59
¬¼
-0.58
roth
-0.57
scrib
-0.57
Corona
-0.57
roller
-0.56
POSITIVE LOGITS
how
1.26
HOW
1.16
whether
1.16
WHY
1.14
why
1.10
how
1.06
why
1.05
whether
1.05
whereabouts
1.01
How
0.91
Activations Density 0.251%