INDEX
Explanations
phrases emphasizing the significance of various subjects or themes
New Auto-Interp
Negative Logits
PreferredItem
-0.87
المشاركات
-0.86
insee
-0.73
mybatisplus
-0.66
oeufs
-0.65
ysmal
-0.64
Worse
-0.64
MotionEvent
-0.64
ronym
-0.63
winkel
-0.62
POSITIVE LOGITS
gradually
0.85
Gradually
0.83
gradual
0.74
preferences
0.70
importance
0.67
preference
0.62
Preferences
0.55
dần
0.55
Preference
0.54
significance
0.51
Activations Density 0.036%