INDEX
    Explanations

    phrases relating to reasons or justifications

    "reason" or "reasons"

    New Auto-Interp
    Negative Logits
     Inoue
    -0.45
     web
    -0.43
     online
    -0.41
     at
    -0.40
     Goldberg
    -0.39
     immersive
    -0.38
     multi
    -0.38
     Colbert
    -0.37
     dök
    -0.36
     dark
    -0.36
    POSITIVE LOGITS
     Reason
    1.29
    reason
    1.26
     reason
    1.25
    Reason
    1.23
     Reasons
    1.19
    REASON
    1.14
     REASON
    1.14
     reasons
    1.13
    Reasons
    1.13
    reasons
    1.06
    Act Density 0.124%

    No Known Activations