INDEX
Explanations
phrases that express comparisons or similarities
New Auto-Interp
Negative Logits
iman
-0.18
amp
-0.16
iyim
-0.16
inya
-0.15
inas
-0.15
ulur
-0.14
mtree
-0.14
orang
-0.14
amax
-0.14
iyon
-0.14
POSITIVE LOGITS
referrer
0.14
nhau
0.14
KeyPressed
0.13
aid
0.13
?url
0.13
ública
0.13
ghi
0.13
-fw
0.13
tess
0.12
aug
0.12
Activations Density 0.020%