INDEX
    Explanations

    type followed by specific descriptor

    New Auto-Interp
    Negative Logits
     História
    0.76
     Histogram
    0.72
     Probleme
    0.70
     Hobbit
    0.68
     Failed
    0.68
     Device
    0.66
     Emotional
    0.66
     Goodbye
    0.66
     Decrease
    0.66
     Revisited
    0.66
    POSITIVE LOGITS
     widths
    0.65
    illing
    0.64
     दोनों
    0.63
     gratification
    0.59
     উভয়
    0.59
    robes
    0.58
    linge
    0.57
     định
    0.56
     manners
    0.56
    ipak
    0.56
    Act Density 0.008%

    No Known Activations