INDEX
Explanations
phrases referring to ways in which actions or scenarios are compared or related
comparisons that express similarity or analogy between different subjects or concepts
New Auto-Interp
Negative Logits
igh
-0.73
mun
-0.62
ONSORED
-0.60
Throw
-0.59
eri
-0.58
ategor
-0.57
McGee
-0.56
bart
-0.56
compe
-0.55
throw
-0.55
POSITIVE LOGITS
ettings
0.79
rapists
0.70
ounter
0.70
isSpecialOrderable
0.69
achu
0.69
abl
0.68
liness
0.65
ractor
0.64
aws
0.61
Cooke
0.61
Activations Density 0.045%