INDEX
Explanations
words related to justification or reasoning for actions
terms related to justification and reasoning
New Auto-Interp
Negative Logits
ummer
-0.84
Carbuncle
-0.76
chron
-0.72
ammy
-0.72
Hop
-0.71
estone
-0.70
berry
-0.69
ept
-0.68
grass
-0.68
thritis
-0.67
POSITIVE LOGITS
justifying
1.00
justification
0.89
justify
0.87
justifies
0.87
why
0.79
ably
0.74
justified
0.73
excuses
0.72
guiActiveUn
0.70
ighed
0.69
Activations Density 0.032%