INDEX
Explanations
words related to justification or reasoning for actions, decisions, or opinions
terms related to justification and rationalization
New Auto-Interp
Negative Logits
ummer
-0.82
ammy
-0.76
grass
-0.72
berry
-0.71
chron
-0.70
ept
-0.69
jong
-0.69
Carbuncle
-0.66
enfranch
-0.66
ker
-0.65
POSITIVE LOGITS
justifying
1.07
justifies
0.99
justify
0.98
justification
0.97
why
0.82
excuses
0.81
¿½
0.76
WHY
0.76
justified
0.75
ighed
0.73
Activations Density 0.032%