INDEX
Explanations
phrases indicating responses to various situations
New Auto-Interp
Negative Logits
LOCKS
-0.17
ernet
-0.15
icago
-0.15
ÅĻet
-0.14
ecture
-0.14
vet
-0.14
WISE
-0.14
Pulse
-0.14
loth
-0.14
icious
-0.14
POSITIVE LOGITS
/response
0.21
ivate
0.19
(Response
0.18
=response
0.18
ToSelector
0.18
<|begin_of_text|>
0.17
-response
0.17
.Response
0.17
Drag
0.16
Response
0.16
Activations Density 0.067%