INDEX
    Explanations

    phrases indicating a sense of loss or detachment

    New Auto-Interp
    Negative Logits
    icz
    -0.15
    785
    -0.14
    achi
    -0.14
    arios
    -0.14
    beit
    -0.14
    lad
    -0.13
    itals
    -0.13
    ÐĴС
    -0.13
    ustral
    -0.13
    stances
    -0.13
    POSITIVE LOGITS
     away
    1.83
     Away
    1.61
    away
    1.45
    Away
    1.41
    -away
    1.34
    aways
    0.77
     weg
    0.77
     AW
    0.57
    .aw
    0.55
    awy
    0.48
    Act Density 0.551%

    No Known Activations