INDEX
    Explanations

    references to showing or demonstrating qualities and attributes

    New Auto-Interp
    Negative Logits
     Reputation
    -0.19
    à¹īà¸ĩ
    -0.15
     reputation
    -0.15
    elier
    -0.15
    nika
    -0.15
    kar
    -0.14
    nish
    -0.14
     anonymity
    -0.14
    kur
    -0.13
    inand
    -0.13
    POSITIVE LOGITS
     signs
    0.35
     Signs
    0.32
     how
    0.28
     why
    0.25
     off
    0.24
     initiative
    0.22
     evidence
    0.22
    boat
    0.21
     promise
    0.20
     hvordan
    0.20
    Act Density 0.092%

    No Known Activations