INDEX
    Explanations

    phrases indicating causation or justification

    New Auto-Interp
    Negative Logits
     Majefty
    -0.59
     poffe
    -0.55
     Houſe
    -0.51
     Jefus
    -0.47
     houſe
    -0.43
     himſelf
    -0.43
    laquo
    -0.42
    󠁮
    -0.42
     itſelf
    -0.42
    webElementXpaths
    -0.41
    POSITIVE LOGITS
     reason
    0.83
     Reasons
    0.77
     Reason
    0.74
     reasons
    0.72
    Reasons
    0.72
    Reason
    0.71
    reason
    0.67
     why
    0.61
    REASON
    0.61
     varför
    0.60
    Act Density 0.020%

    No Known Activations