INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
A
0.48
-
0.46
ilah
0.46
ises
0.44
iveness
0.43
’.
0.42
agan
0.42
Disabilities
0.42
].
0.41
istic
0.41
POSITIVE LOGITS
ت
0.93
as
0.92
ك
0.90
i
0.88
ي
0.88
c
0.82
on
0.79
त
0.78
ed
0.77
т
0.77
Activations Density 3.429%