INDEX
Explanations
statements about the attributes or actions of different groups of people
phrases that indicate opinions or beliefs expressed by people
New Auto-Interp
Negative Logits
comes
-0.71
Posts
-0.66
Appearances
-0.62
aign
-0.61
Cumber
-0.59
_.
-0.59
ãĥ¯
-0.57
è£ıè¦ļéĨĴ
-0.57
Wizard
-0.57
Lets
-0.56
POSITIVE LOGITS
disapprove
1.26
prefer
1.13
regretted
1.08
dislike
1.06
've
1.06
're
1.03
'd
1.01
approve
0.98
intend
0.98
condone
0.97
Activations Density 0.105%