INDEX
Explanations
transitional phrases and contrasting conjunctions
New Auto-Interp
Negative Logits
ALI
-0.14
alli
-0.14
awei
-0.13
ogan
-0.13
ali
-0.13
culate
-0.13
subjective
-0.13
°
-0.13
.Cart
-0.13
illy
-0.13
POSITIVE LOGITS
only
0.19
differently
0.17
주ìĿĺ
0.16
upside
0.16
it
0.16
nowhere
0.15
ãģĿãĤĮãģ¯
0.15
ONLY
0.14
onio
0.14
mium
0.14
Activations Density 0.206%