INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    م
    0.72
     spear
    0.64
    Data
    0.63
    Query
    0.62
     Country
    0.62
     large
    0.61
    ین
    0.60
    data
    0.59
     lock
    0.59
    Row
    0.59
    POSITIVE LOGITS
     cigarettes
    1.09
     tobacco
    1.01
     cigarette
    0.96
     Tobacco
    0.93
    🚬
    0.93
     smoking
    0.92
    🚭
    0.91
     nicotine
    0.90
    Tobacco
    0.89
    cigarettes
    0.88
    Act Density 0.026%

    No Known Activations