INDEX
Explanations
expressions that indicate comparisons or likeness
New Auto-Interp
Negative Logits
dafx
-0.56
faptul
-0.53
كومونز
-0.53
asupra
-0.47
oamen
-0.47
huéspedes
-0.46
oamenii
-0.45
vraag
-0.45
Signalez
-0.44
correre
-0.44
POSITIVE LOGITS
appear
0.54
Looks
0.53
Looks
0.52
Meksiku
0.49
appearance
0.47
appears
0.47
looks
0.47
resemble
0.45
выгля
0.44
Appears
0.44
Activations Density 0.139%