INDEX
Explanations
phrases indicating a request or suggestion
New Auto-Interp
Negative Logits
"},"
-0.71
¶
-0.69
prototype
-0.66
AMP
-0.61
DN
-0.61
anguages
-0.59
Mehran
-0.57
constitu
-0.57
](
-0.56
iege
-0.56
POSITIVE LOGITS
yourselves
1.43
yourself
1.09
me
0.88
kidding
0.86
ichever
0.82
ye
0.82
beware
0.80
ifully
0.80
thy
0.79
quote
0.78
Activations Density 0.155%