INDEX
Explanations
providing explanations or reasons
New Auto-Interp
Negative Logits
სპეცი
0.54
världen
0.53
कायदा
0.47
Еўро
0.47
otherapie
0.46
ännu
0.46
frågor
0.44
💾
0.44
gravar
0.44
会不会
0.44
POSITIVE LOGITS
although
0.52
four
0.49
because
0.48
seven
0.46
because
0.46
although
0.45
Although
0.45
唯一
0.45
Because
0.45
selected
0.44
Activations Density 0.033%