INDEX
    Explanations

    expressions of strong emotions and personal experiences

    New Auto-Interp
    Negative Logits
    æ©
    -0.63
     withd
    -0.58
     redes
    -0.56
     Cooke
    -0.55
    imposed
    -0.53
    etheless
    -0.53
    ļéĨĴ
    -0.52
    lished
    -0.52
    jri
    -0.51
    req
    -0.51
    POSITIVE LOGITS
    "—
    0.97
    "?
    0.97
    ,'"
    0.96
    "]
    0.96
    ")
    0.94
    %"
    0.91
    "),
    0.87
    ":
    0.83
    zbollah
    0.83
    .")
    0.83
    Act Density 0.268%

    No Known Activations