INDEX
Explanations
descriptive phrases related to the physical environment
New Auto-Interp
Negative Logits
chairs
-0.82
idents
-0.80
encers
-0.77
yond
-0.73
olas
-0.73
eeds
-0.72
lees
-0.71
bots
-0.71
enes
-0.71
masters
-0.69
POSITIVE LOGITS
hurdle
1.13
installment
0.97
thing
0.97
glimpse
0.94
dose
0.94
reminder
0.92
glance
0.90
chance
0.90
disclaimer
0.88
piece
0.87
Activations Density 0.074%