INDEX
Explanations
expressions of disappointment and unmet expectations
New Auto-Interp
Negative Logits
gan
-0.15
agan
-0.15
uer
-0.14
inger
-0.14
UTE
-0.14
atus
-0.13
iga
-0.13
undred
-0.13
apolis
-0.13
iken
-0.13
POSITIVE LOGITS
expected
0.48
expectation
0.45
expecting
0.45
expectations
0.43
expected
0.38
EXPECT
0.37
Expected
0.37
expect
0.37
expect
0.36
expects
0.36
Activations Density 0.170%