INDEX
Explanations
Ara h, Vicuna, Man Hemlock, HER2, Sbarro, H
New Auto-Interp
Negative Logits
ل
0.91
ون
0.88
ج
0.85
માં
0.79
ق
0.77
ע
0.75
ку
0.75
ف
0.74
ח
0.74
in
0.73
POSITIVE LOGITS
0.79
to
0.60
,
0.51
was
0.49
piccoli
0.48
bessere
0.48
$,
0.47
Çok
0.47
Но
0.46
był
0.46
Activations Density 0.268%