INDEX
Explanations
adjectives that express opinions or judgments
phrases expressing opinions or evaluations
New Auto-Interp
Negative Logits
ãĥīãĥ©
-0.74
arthed
-0.68
andise
-0.67
parcels
-0.65
ãĤº
-0.64
è¦ļéĨĴ
-0.63
gust
-0.62
dden
-0.62
translation
-0.61
éĹĺ
-0.60
POSITIVE LOGITS
underest
0.76
underestimate
0.75
kidding
0.69
overest
0.66
underestimated
0.66
smarter
0.64
worthwhile
0.63
miscon
0.63
ono
0.62
deserved
0.61
Activations Density 0.336%