INDEX
Explanations
phrases related to shutting things down or turning things off
New Auto-Interp
Negative Logits
enegger
-0.72
ozy
-0.71
arya
-0.69
omething
-0.68
lihood
-0.68
IMAGES
-0.67
xual
-0.67
ãĥīãĥ©ãĤ´ãĥ³
-0.66
Plenty
-0.65
intent
-0.63
POSITIVE LOGITS
tered
1.35
tering
1.18
downs
1.01
down
0.97
down
0.96
shut
0.90
doors
0.88
shut
0.83
downs
0.83
ters
0.81
Activations Density 0.556%