INDEX
Explanations
phrases related to a strong positive response or approval
New Auto-Interp
Negative Logits
Welsh
-0.73
Dynamics
-0.70
BALL
-0.70
Holmes
-0.65
flank
-0.64
Cruiser
-0.63
snippets
-0.62
canopy
-0.61
spoilers
-0.60
Ballard
-0.59
POSITIVE LOGITS
pec
1.27
ourced
1.19
ounding
1.18
olver
1.16
umption
1.13
olute
1.13
igned
1.09
olutely
1.09
pell
1.07
ign
1.06
Activations Density 0.010%