INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ی
0.90
و
0.84
화학
0.83
ك
0.75
냉
0.74
Italie
0.73
잠
0.73
에도
0.72
이
0.71
ако
0.71
POSITIVE LOGITS
wounded
0.86
alphas
0.79
ulcers
0.76
clogged
0.74
almonds
0.73
ups
0.73
choroby
0.73
дной
0.71
unwavering
0.71
ascended
0.71
Activations Density 0.000%