INDEX
Explanations
imperative sentences directing action or behavior
instances of numerical references or rankings
New Auto-Interp
Negative Logits
basket
-0.77
hust
-0.71
swinging
-0.69
hung
-0.68
guarding
-0.66
whipping
-0.65
swept
-0.63
imagining
-0.62
bered
-0.62
assum
-0.62
POSITIVE LOGITS
theless
0.90
SHARES
0.80
å¹
0.79
Adds
0.77
ILCS
0.77
Expand
0.75
:{0.75
Contents
0.74
Detected
0.74
maxwell
0.73
Activations Density 0.211%