INDEX
Explanations
expressions of boredom
New Auto-Interp
Negative Logits
davis
-0.44
UCN
-0.44
eventual
-0.41
TCL
-0.40
ثيق
-0.39
daly
-0.39
Buch
-0.38
Davis
-0.38
ıntı
-0.38
Dickson
-0.37
POSITIVE LOGITS
bored
1.96
Bored
1.86
Bored
1.80
bored
1.65
boredom
1.48
Bore
1.00
Bore
0.84
bore
0.82
abur
0.80
无聊
0.78
Activations Density 0.003%