INDEX
Explanations
technical and instructional language
New Auto-Interp
Negative Logits
aborted
-0.71
bundled
-0.65
suppressed
-0.62
renewed
-0.62
inflated
-0.61
fading
-0.61
flares
-0.61
questioned
-0.61
laure
-0.60
tuning
-0.60
POSITIVE LOGITS
prison
1.05
date
1.04
advertising
1.02
acqu
1.01
equ
0.99
existence
0.99
inf
0.99
distance
0.99
sent
0.98
order
0.97
Activations Density 0.040%