INDEX
Explanations
phrases that express doubt or challenge conventional beliefs
questioning and doubting
New Auto-Interp
Negative Logits
fix
-0.31
AutoScaleMode
-0.30
Vergnügen
-0.30
promociones
-0.29
hän
-0.28
hoffe
-0.28
fenómeno
-0.27
protectora
-0.27
nélk
-0.27
fix
-0.26
POSITIVE LOGITS
questioning
0.88
questioned
0.86
cuestion
0.85
doubting
0.82
Chall
0.77
challenged
0.77
Challenging
0.76
Chall
0.74
质疑
0.74
challenged
0.72
Activations Density 0.164%