INDEX
Explanations
sentences starting with the word "But"
New Auto-Interp
Negative Logits
obyl
-0.67
ãģ®
-0.64
pointer
-0.64
kie
-0.62
built
-0.62
irms
-0.61
ãģĨ
-0.61
ãĤµ
-0.61
edu
-0.60
è¦ļéĨĴ
-0.59
POSITIVE LOGITS
tons
1.38
alas
1.29
beware
1.01
chers
1.00
hey
0.93
unlike
0.92
chery
0.91
luckily
0.90
fortunately
0.90
tery
0.88
Activations Density 0.467%