INDEX
Explanations
statements explaining reasons or causes
phrases that explain reasons or justifications
New Auto-Interp
Negative Logits
kees
-0.70
torch
-0.69
odes
-0.69
apixel
-0.69
nces
-0.68
kun
-0.66
adiq
-0.66
uania
-0.65
leted
-0.63
dig
-0.63
POSITIVE LOGITS
Reason
1.13
cause
1.09
Because
1.05
reasons
1.00
Cause
1.00
Because
0.97
because
0.94
ecause
0.93
because
0.87
Reasons
0.84
Activations Density 0.216%