INDEX
Explanations
phrases enclosed by quotation marks
instances of quotation marks, indicating direct speech or citations
New Auto-Interp
Negative Logits
Alger
-0.74
derby
-0.69
posting
-0.69
publishing
-0.66
fury
-0.65
affiliate
-0.65
Gaal
-0.65
Crane
-0.65
seasoned
-0.64
shaking
-0.64
POSITIVE LOGITS
almost
1.10
every
1.09
extremely
1.03
little
1.02
highly
1.01
moderate
1.00
self
1.00
super
1.00
never
0.99
their
0.99
Activations Density 0.134%