INDEX
Explanations
phrases or questions related to visual comparisons or hypothetical scenarios
phrases that inquire about appearances or states of being
New Auto-Interp
Negative Logits
Americ
-0.70
Force
-0.59
tsy
-0.58
cipl
-0.57
arters
-0.55
resent
-0.55
flurry
-0.55
Sharp
-0.55
helps
-0.54
ãĥĥãĥī
-0.54
POSITIVE LOGITS
lihood
0.81
WITHOUT
0.81
liest
0.78
inside
0.75
beforehand
0.74
without
0.74
nowadays
0.73
unto
0.73
compared
0.72
outside
0.71
Activations Density 0.045%