INDEX
Explanations
mentions of authority figures or entities
words that emphasize importance or significance
New Auto-Interp
Negative Logits
reports
-0.70
Vaughan
-0.70
Rounds
-0.69
credits
-0.69
Emails
-0.68
reports
-0.67
upgrades
-0.66
ographs
-0.66
alerts
-0.65
Originally
-0.65
POSITIVE LOGITS
rouse
1.15
lot
1.06
usterity
1.04
bunch
0.96
finite
0.94
particular
0.93
versive
0.93
plethora
0.92
few
0.91
hypothetical
0.91
Activations Density 0.905%