INDEX
Explanations
repetitive phrases or articles
New Auto-Interp
Negative Logits
thood
-0.72
iffe
-0.72
leeve
-0.70
advertising
-0.64
IDs
-0.63
-$
-0.61
egu
-0.61
because
-0.60
iscover
-0.60
suppose
-0.58
POSITIVE LOGITS
ses
1.22
same
1.03
entirety
1.00
slightest
0.97
entire
0.97
majority
0.95
latter
0.94
longest
0.94
quickest
0.94
extent
0.92
Activations Density 0.128%