INDEX
Explanations
asking clarifying questions
New Auto-Interp
Negative Logits
pthread
0.84
Spo
0.79
cję
0.77
Arts
0.76
sera
0.76
Stre
0.76
ఫ్
0.74
Substring
0.73
npm
0.72
Econom
0.71
POSITIVE LOGITS
questions
1.55
question
1.19
Questions
1.15
probing
1.14
permission
1.10
about
1.08
how
1.06
rhet
1.03
clarifying
1.01
Fragen
1.01
Activations Density 0.022%