INDEX
Explanations
instances of encouragement or supportive language
New Auto-Interp
Negative Logits
بيها
-0.84
id
-0.68
Bae
-0.66
Bae
-0.66
en
-0.66
println
-0.64
machte
-0.64
quad
-0.63
Koz
-0.62
Фа
-0.62
POSITIVE LOGITS
couragement
1.15
couraged
1.14
discouraged
1.05
encouragement
1.02
couraging
0.98
discourage
0.95
encouragement
0.94
multer
0.93
encouraged
0.91
Encourage
0.88
Activations Density 0.008%