INDEX
Explanations
sentences involving admiration or positive sentiment towards qualities or actions of people
expressions of opinion about people and their characteristics
New Auto-Interp
Negative Logits
etheless
-0.67
*.
-0.49
Beir
-0.47
ãĤ´ãĥ³
-0.45
evidence
-0.45
Frie
-0.43
ãĤ¢ãĥ«
-0.43
appropriately
-0.42
amera
-0.42
issance
-0.42
POSITIVE LOGITS
however
0.52
tho
0.50
alot
0.43
though
0.43
!)
0.41
mag
0.40
natureconservancy
0.40
0.39
lay
0.39
?)
0.38
Activations Density 2.691%