INDEX
Explanations
negative judgments about behavior or actions
New Auto-Interp
Negative Logits
NameInMap
-0.49
تحد
-0.47
sund
-0.47
hten
-0.46
PageContext
-0.46
change
-0.45
gaz
-0.45
prove
-0.45
лам
-0.45
?
-0.44
POSITIVE LOGITS
dignité
0.64
Efq
0.64
primaire
0.60
原始内容存档于
0.59
μως
0.59
chré
0.59
honte
0.59
例文帳に追加
0.59
Shakspeare
0.58
dàng
0.57
Activations Density 0.117%