INDEX
    Explanations

    keywords associated with providing explanations or justifications

    phrases explaining motivations or justifications

    New Auto-Interp
    Negative Logits
     stocking
    -0.70
    annis
    -0.70
     puck
    -0.68
    KY
    -0.67
    ymph
    -0.67
    aeper
    -0.66
     helicop
    -0.65
     Roller
    -0.65
    avorite
    -0.65
    thus
    -0.65
    POSITIVE LOGITS
     why
    1.10
     WHY
    0.98
    why
    0.88
    abl
    0.83
     reason
    0.80
     rationale
    0.77
    usercontent
    0.76
     justifying
    0.75
    orial
    0.75
    ="#
    0.74
    Act Density 0.025%

    No Known Activations