INDEX
Explanations
question prompts and instructions
New Auto-Interp
Negative Logits
but
0.92
それを
0.82
ovviamente
0.82
αλλά
0.80
évidemment
0.80
όχι
0.79
ಅದು
0.79
фактически
0.78
거고
0.77
essentially
0.76
POSITIVE LOGITS
Suppose
1.31
Suppose
1.24
When
1.16
During
1.11
When
1.10
During
1.04
After
1.03
Podczas
1.01
Imagine
1.00
Following
0.97
Activations Density 0.334%