INDEX
    Explanations

    references to various types of alcoholic beverages

    New Auto-Interp
    Negative Logits
    CHELL
    -0.43
    Att
    -0.41
    InputBorder
    -0.40
    Adapt
    -0.40
    -0.40
    gnore
    -0.40
     vicin
    -0.39
    Dat
    -0.38
     الحد
    -0.38
    AllowUser
    -0.38
    POSITIVE LOGITS
     Whiskey
    1.23
     whiskey
    1.22
    Whiskey
    1.15
     Whisky
    1.12
     whisky
    1.09
    whiskey
    1.07
    whisky
    0.86
     bourbon
    0.75
     Bourbon
    0.73
    🥃
    0.73
    Act Density 0.002%

    No Known Activations