INDEX
    Explanations

    spending time on activities

    New Auto-Interp
    Negative Logits
    as
    0.68
    ted
    0.68
    idan
    0.67
    to
    0.65
    form
    0.63
    en
    0.63
    in
    0.61
     have
    0.60
    will
    0.60
     staunch
    0.60
    POSITIVE LOGITS
    ла
    0.80
     água
    0.79
    ı
    0.75
     coleção
    0.74
    0.73
    ับ
    0.71
    0.69
    ния
    0.67
    ни
    0.66
    ло
    0.66
    Act Density 0.001%

    No Known Activations