INDEX
    Explanations

    numerical values

    New Auto-Interp
    Negative Logits
     belong
    -0.80
     belongs
    -0.76
     separat
    -0.68
     Goodbye
    -0.66
     Badge
    -0.65
    izable
    -0.65
     Stability
    -0.65
    士
    -0.65
     Carry
    -0.64
     Tags
    -0.64
    POSITIVE LOGITS
     able
    1.34
     unable
    1.26
     surprised
    1.20
     hesitant
    1.19
     reluctant
    1.18
     alerted
    1.15
     unaware
    1.14
     aware
    1.13
     astonished
    1.11
     pleased
    1.11
    Act Density 0.271%

    No Known Activations