INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     
    0.29
    !
    0.23
     cinema
    0.23
     quilt
    0.21
    kannya
    0.21
     gymnasium
    0.21
     kitchen
    0.20
     sliders
    0.20
     datos
    0.20
     bakery
    0.20
    POSITIVE LOGITS
    Drinks
    0.29
     Drinks
    0.29
    Drinking
    0.29
     Drinking
    0.28
    ك
    0.27
    Drink
    0.26
     Drink
    0.26
    Bott
    0.24
     uống
    0.24
     Beverages
    0.24
    Act Density 0.191%

    No Known Activations