INDEX
Explanations
reasons and justifications
phrases that indicate multiple reasons or justifications
New Auto-Interp
Negative Logits
bilt
-0.71
ream
-0.71
xon
-0.69
esc
-0.67
Pixie
-0.66
externalActionCode
-0.66
ibaba
-0.66
franc
-0.65
quet
-0.64
Nanto
-0.63
POSITIVE LOGITS
why
0.99
reasons
0.94
pointers
0.90
justifying
0.82
WHY
0.82
why
0.81
æĦ
0.81
Reasons
0.81
arguments
0.81
explanations
0.78
Activations Density 0.023%