INDEX
Explanations
personal pronouns and relational phrases
expressions of personal connection and involvement
New Auto-Interp
Negative Logits
nesota
-0.83
ancouver
-0.74
lihood
-0.72
aults
-0.70
reens
-0.69
itton
-0.66
okemon
-0.66
neapolis
-0.65
Marginal
-0.65
vernight
-0.64
POSITIVE LOGITS
Promise
0.81
udic
0.78
gladly
0.76
'm
0.76
ona
0.74
bara
0.72
rish
0.69
recommend
0.69
am
0.68
promise
0.66
Activations Density 0.335%