INDEX
Explanations
your request or instruction
New Auto-Interp
Negative Logits
Responses
0.87
Antworten
0.78
Responses
0.77
Answers
0.76
responses
0.76
Answers
0.72
answers
0.71
responses
0.68
雑貨
0.68
Outcomes
0.67
POSITIVE LOGITS
query
1.09
requête
1.04
запрос
1.00
message
1.00
query
0.93
instruction
0.92
provocative
0.92
instructions
0.91
plea
0.90
prompt
0.90
Activations Density 0.271%