INDEX
Explanations
numerical patterns within text
the occurrence of numerical figures or statistics in the text
New Auto-Interp
Negative Logits
travers
-0.70
swinging
-0.68
hung
-0.68
heroine
-0.66
guarding
-0.66
brav
-0.65
rebuilding
-0.64
hust
-0.63
swallowing
-0.63
railways
-0.62
POSITIVE LOGITS
SHARES
0.97
ctr
0.94
maxwell
0.89
:{0.82
Expand
0.81
å¹
0.81
Explicit
0.80
ILCS
0.79
Reviewer
0.78
MN
0.76
Activations Density 0.128%