INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    TEMP
    -0.07
    */
    ↵
    ↵
    -0.07
     namely
    -0.07
     Vikings
    -0.06
     Coloring
    -0.06
     distributors
    -0.06
     Tent
    -0.06
    106
    -0.06
     ath
    -0.06
     vip
    -0.06
    POSITIVE LOGITS
     лицо
    0.07
    ikan
    0.07
     politic
    0.06
    (sensor
    0.06
     Shin
    0.06
    ерж
    0.06
     Pros
    0.06
    .params
    0.06
     conventions
    0.06
     пост
    0.06
    Act Density 0.221%

    No Known Activations