INDEX
Explanations
phrases that express perceptions, evaluations, or opinions
New Auto-Interp
Negative Logits
أجل
-0.66
Alexei
-0.63
sobran
-0.60
Alexey
-0.60
quên
-0.60
первых
-0.59
mogat
-0.59
eksper
-0.58
πριν
-0.57
latego
-0.56
POSITIVE LOGITS
regarded
1.08
treated
0.93
viewed
0.92
Treated
0.82
Viewed
0.80
treating
0.80
trataba
0.76
Treating
0.75
considéré
0.70
treated
0.69
Activations Density 0.341%