INDEX
    Explanations

    potential, exclusive, or satisfactory

    New Auto-Interp
    Negative Logits
    yscrapers
    0.68
    funk
    0.62
    Homo
    0.61
    ibraries
    0.59
    irds
    0.59
     vulgaris
    0.57
    ieros
    0.57
     ecosystems
    0.56
    humans
    0.56
    SPACE
    0.56
    POSITIVE LOGITS
     slightly
    0.91
     ಸ್ವಲ್ಪ
    0.90
     కొంత
    0.89
     இரண்டாவது
    0.85
     relatively
    0.84
     약간
    0.80
     satisfactory
    0.80
     Slightly
    0.79
     sedikit
    0.78
     சற்று
    0.77
    Act Density 0.001%

    No Known Activations