INDEX
Explanations
phrases that include the speaker's name and self-identification
New Auto-Interp
Negative Logits
ysl
-0.20
axed
-0.15
uard
-0.15
lags
-0.14
amburg
-0.14
hood
-0.14
crete
-0.14
fect
-0.14
persu
-0.13
Ãłng
-0.13
POSITIVE LOGITS
oppins
0.16
å¼ı
0.15
Expose
0.15
å¸ĥ
0.15
Introduced
0.14
ÙIJب
0.14
اراÙĨ
0.14
oÄŁ
0.14
edd
0.14
602
0.14
Activations Density 0.082%