INDEX
Explanations
common phrases or expressions indicating comparisons or contrasts
New Auto-Interp
Negative Logits
uj
-0.15
дан
-0.15
ÑįÑĤа
-0.15
ist
-0.15
تÙĦÙĥ
-0.15
adele
-0.14
åŃĹ
-0.14
ÑįÑĤÑĥ
-0.14
ung
-0.13
ungs
-0.13
POSITIVE LOGITS
eso
0.42
cela
0.35
isso
0.35
ça
0.35
váºŃy
0.32
ello
0.28
THAT
0.28
ذÙĦÙĥ
0.28
Äijó
0.25
esto
0.25
Activations Density 0.301%