INDEX
Explanations
phrases that indicate reference to prior statements or discussions
New Auto-Interp
Negative Logits
acho
-0.17
Carn
-0.16
negatives
-0.15
Progressive
-0.14
ØŃاج
-0.14
oge
-0.14
acades
-0.14
ires
-0.14
otte
-0.14
at
-0.14
POSITIVE LOGITS
imdi
0.16
kolo
0.16
pend
0.15
pig
0.15
.Entry
0.15
">//
0.14
cü
0.14
uzzi
0.14
rvine
0.14
yasal
0.14
Activations Density 0.063%