INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    328
    -0.07
     pricey
    -0.07
     score
    -0.07
    bnb
    -0.07
     کام
    -0.07
    OURS
    -0.07
     фон
    -0.06
     System
    -0.06
    Subjects
    -0.06
    zet
    -0.06
    POSITIVE LOGITS
     antenna
    0.11
     antennas
    0.11
     antenn
    0.06
    θι
    0.06
    pun
    0.06
    ANTA
    0.06
    Try
    0.06
    ..\
    0.06
    gn
    0.06
     artır
    0.06
    Act Density 0.002%

    No Known Activations