INDEX
Explanations
phrases indicating engagement or direction in conversation
New Auto-Interp
Negative Logits
around
-0.07
stitute
-0.06
zim
-0.06
ctrine
-0.06
eri
-0.06
/repos
-0.06
1
-0.06
emi
-0.06
ble
-0.06
illicit
-0.06
POSITIVE LOGITS
.scalablytyped
0.09
uales
0.07
vail
0.07
pen
0.07
obar
0.07
Ñİк
0.07
.until
0.06
tsy
0.06
Äįen
0.06
(=)
0.06
Activations Density 0.012%