INDEX
Explanations
phrases indicating strong emotional reactions or assessments, particularly related to disappointment or concern
New Auto-Interp
Negative Logits
mektedir
-0.98
INSEE
-0.69
maktadır
-0.69
CANNOT
-0.60
KeyEvent
-0.60
才可以
-0.60
Diweddarwch
-0.60
rsiniz
-0.60
mıştır
-0.60
AutoScaleMode
-0.58
POSITIVE LOGITS
gotta
1.09
outta
1.01
تانيه
1.01
gonna
1.00
somethin
0.97
GONNA
0.88
talkin
0.86
nothin
0.84
Gonna
0.82
Gimme
0.82
Activations Density 0.383%