INDEX
Explanations
phrases indicating likelihood or possibility
phrases indicating uncertainty or speculation
New Auto-Interp
Negative Logits
rouse
-0.75
avorite
-0.73
orem
-0.73
iling
-0.72
otos
-0.72
irling
-0.69
ategory
-0.69
izons
-0.68
aving
-0.68
]+
-0.67
POSITIVE LOGITS
doubtful
0.80
unclear
0.77
probable
0.74
Ĥİ
0.72
unfair
0.71
abundantly
0.66
rils
0.66
ril
0.64
Frameworks
0.64
clear
0.63
Activations Density 0.069%