INDEX
Explanations
explanationscripts, and communication
New Auto-Interp
Negative Logits
în
0.75
in
0.70
new
0.70
ג
0.66
びっくり
0.65
with
0.63
at
0.62
במ
0.62
깜
0.62
preliminary
0.61
POSITIVE LOGITS
혹은
0.90
yoki
0.86
หรือ
0.83
ataupun
0.80
もしくは
0.80
হোক
0.76
ಅಥವಾ
0.76
లేదా
0.75
或者
0.73
அல்லது
0.72
Activations Density 0.017%