INDEX
Explanations
words related to offering potential reasons or justifications
phrases related to providing explanations or justifications
New Auto-Interp
Negative Logits
illet
-0.80
estial
-0.76
zig
-0.73
opers
-0.72
sembly
-0.72
ibaba
-0.71
shr
-0.70
oned
-0.69
ymph
-0.69
raid
-0.68
POSITIVE LOGITS
WHY
1.07
why
1.00
explanations
0.91
why
0.86
explanation
0.85
thereof
0.78
rationale
0.75
explan
0.75
explaining
0.72
ation
0.71
Activations Density 0.044%