INDEX
Explanations
phrases indicating causation or justification
New Auto-Interp
Negative Logits
Majefty
-0.59
poffe
-0.55
Houſe
-0.51
Jefus
-0.47
houſe
-0.43
himſelf
-0.43
laquo
-0.42
-0.42
itſelf
-0.42
webElementXpaths
-0.41
POSITIVE LOGITS
reason
0.83
Reasons
0.77
Reason
0.74
reasons
0.72
Reasons
0.72
Reason
0.71
reason
0.67
why
0.61
REASON
0.61
varför
0.60
Activations Density 0.020%