INDEX
Explanations
reasons or justifications for a situation or phenomenon
New Auto-Interp
Negative Logits
illet
-1.22
jet
-1.06
engeance
-1.03
estial
-1.00
luster
-0.99
emp
-0.93
ammy
-0.93
nir
-0.91
ategory
-0.91
mire
-0.91
POSITIVE LOGITS
why
1.45
WHY
1.31
udic
1.09
why
1.08
ĸļ
1.07
how
1.01
ance
1.01
cases
0.99
¿½
0.98
explanations
0.97
Activations Density 0.888%