INDEX
Explanations
surprisingly followed by unexpected outcomes
New Auto-Interp
Negative Logits
किंवा
0.55
yoki
0.54
veya
0.52
ಅಥವಾ
0.49
或者
0.48
หรือ
0.45
અથવા
0.45
अक्सर
0.45
하거나
0.44
లేదా
0.44
POSITIVE LOGITS
použit
0.43
appliquée
0.42
لتح
0.41
unemployed
0.40
опять
0.39
ancients
0.39
umet
0.39
obat
0.39
stimulated
0.38
expts
0.38
Activations Density 0.001%