INDEX
Explanations
expressions of concern or distress
New Auto-Interp
Negative Logits
Ranked
-0.69
livest
-0.65
eatures
-0.63
perm
-0.62
amins
-0.60
dated
-0.58
oba
-0.58
ut
-0.57
amen
-0.57
eele
-0.56
POSITIVE LOGITS
ingly
0.86
about
0.81
ABOUT
0.79
dy
0.78
NESS
0.77
aback
0.74
vier
0.72
der
0.71
whel
0.70
enough
0.68
Activations Density 1.693%