INDEX
Explanations
phrases expressing opinions or beliefs
assertive statements of personal opinion or belief
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.76
éĹĺ
-0.76
unavailable
-0.74
è¦ļéĨĴ
-0.70
éŃĶ
-0.69
ãĥīãĥ©
-0.67
alid
-0.67
none
-0.63
iann
-0.63
ãĤº
-0.62
POSITIVE LOGITS
underrated
1.11
underest
0.96
underestimate
0.93
deserved
0.93
underestimated
0.92
fair
0.89
ought
0.89
overest
0.88
beh
0.87
unfairly
0.87
Activations Density 0.327%