INDEX
Explanations
affirmation and confirmation
New Auto-Interp
Negative Logits
नसल्या
0.42
Unless
0.39
लगाएं
0.38
Nowhere
0.38
نباش
0.37
Hardly
0.36
없다
0.36
Rarely
0.36
없이
0.36
我们将
0.36
POSITIVE LOGITS
确实
1.93
indeed
1.77
確實
1.73
memang
1.59
確かに
1.48
indeed
1.46
DID
1.39
的确
1.39
Indeed
1.36
DOES
1.36
Activations Density 0.067%