INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    t
    0.72
     вечером
    0.52
     ފ
    0.51
    0.50
    b
    0.49
    0.47
    0.47
     رسم
    0.46
    d
    0.46
     nightlife
    0.45
    POSITIVE LOGITS
    ра
    0.54
    τα
    0.54
    рованной
    0.54
     lunch
    0.53
    ευ
    0.52
    éal
    0.52
     Lunch
    0.51
    pdbonly
    0.51
    ме
    0.50
     lunches
    0.50
    Act Density 0.039%

    No Known Activations