INDEX
    Explanations

    words reflecting personal relationships and emotional connections

    New Auto-Interp
    Negative Logits
     unavailable
    -0.65
    absent
    -0.60
     impossible
    -0.57
     unseen
    -0.57
     Absent
    -0.54
     absent
    -0.53
     impossibility
    -0.50
     lacked
    -0.49
     Impossible
    -0.48
     unthinkable
    -0.48
    POSITIVE LOGITS
     doesn
    1.61
    doesn
    1.40
     Doesn
    1.39
     does
    1.39
    Does
    1.29
     don
    1.29
     Does
    1.28
    does
    1.27
    Doesn
    1.27
     doesnt
    1.24
    Act Density 0.228%

    No Known Activations