INDEX
Explanations
instances of the word "though."
New Auto-Interp
Negative Logits
igo
-0.16
coe
-0.15
nt
-0.15
ydı
-0.14
ube
-0.14
ç·ł
-0.14
gon
-0.14
ìĦ¸
-0.14
enor
-0.14
ãĤ¶ãĥ¼
-0.14
POSITIVE LOGITS
odor
0.17
enth
0.16
illing
0.16
ród
0.15
fois
0.15
ãģĬãĤĬ
0.15
oldt
0.14
ackle
0.14
onth
0.14
Khi
0.14
Activations Density 0.020%