INDEX
Explanations
pronoun wanting or recommending
New Auto-Interp
Negative Logits
truc
0.46
deng
0.40
ho
0.39
del
0.38
mne
0.37
ade
0.35
Ho
0.35
치
0.35
adequacy
0.35
mode
0.34
POSITIVE LOGITS
pending
0.45
highlight
0.42
sottoline
0.41
تقول
0.41
popularly
0.41
よろしく
0.41
平常
0.40
পড়ার
0.39
காற்று
0.39
quiero
0.39
Activations Density 0.003%