INDEX
    Explanations

    references to various types of alcoholic beverages

    New Auto-Interp
    Negative Logits
    ipt
    -0.16
    ارات
    -0.15
     dist
    -0.15
     repr
    -0.14
    coc
    -0.14
    pek
    -0.14
    sun
    -0.14
    _chance
    -0.14
    ÑįÑĦ
    -0.14
     Works
    -0.13
    POSITIVE LOGITS
     white
    0.28
     whites
    0.28
     ries
    0.26
     Gew
    0.24
     Pin
    0.24
     Mos
    0.24
     ros
    0.24
    white
    0.24
     Cab
    0.23
     wines
    0.23
    Act Density 0.043%

    No Known Activations