INDEX
Explanations
phrases indicating comparison or similarity
New Auto-Interp
Negative Logits
ModelExpression
-0.69
ſtand
-0.54
íč
-0.53
chofe
-0.52
出
-0.49
οποία
-0.48
anſ
-0.48
Trabal
-0.48
uſed
-0.48
ſta
-0.47
POSITIVE LOGITS
a
1.23
an
0.97
part
0.73
"]="
0.72
]='\
0.67
gæ
0.67
]=="
0.64
=?";
0.64
полноцен
0.64
Bestandteil
0.63
Activations Density 0.239%