INDEX
Explanations
conditional phrases indicating contrast or contradiction
New Auto-Interp
Negative Logits
isia
-0.17
amet
-0.15
íĴĪ
-0.14
argo
-0.13
ancybox
-0.13
orizontal
-0.13
ahat
-0.13
.hr
-0.13
ItemAt
-0.13
erset
-0.13
POSITIVE LOGITS
cop
0.18
Cop
0.15
supposed
0.15
Æ°á»Łng
0.15
plenty
0.15
advanced
0.15
OX
0.15
Cair
0.15
IMDb
0.15
monic
0.14
Activations Density 0.170%