INDEX
Explanations
conversational fillers and casual discourse markers
New Auto-Interp
Negative Logits
igy
-0.19
itler
-0.16
اط
-0.16
.tel
-0.14
eway
-0.14
ama
-0.14
foy
-0.14
wg
-0.13
lena
-0.13
Ain
-0.13
POSITIVE LOGITS
er
0.47
um
0.43
um
0.34
err
0.34
well
0.33
uh
0.33
ah
0.33
shall
0.31
erm
0.30
uh
0.30
Activations Density 0.146%