INDEX
    Explanations

    positive adjectives

    New Auto-Interp
    Negative Logits
    (window
    -0.06
     Bed
    -0.06
     داده
    -0.06
    _clock
    -0.06
    .threshold
    -0.06
     ок
    -0.06
    battle
    -0.06
    уль
    -0.06
    ýv
    -0.06
    animal
    -0.06
    POSITIVE LOGITS
    adoo
    0.07
    0.06
     Sync
    0.06
     milfs
    0.06
    0.06
     suppose
    0.06
    ерш
    0.06
    Diese
    0.06
     Forward
    0.06
    Persistent
    0.06
    Act Density 0.019%

    No Known Activations