INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     shiny
    -0.07
     wid
    -0.07
     основ
    -0.07
     promised
    -0.06
     organized
    -0.06
    enda
    -0.06
     rims
    -0.06
    ily
    -0.06
    ina
    -0.06
     handed
    -0.06
    POSITIVE LOGITS
     detection
    0.17
     Detection
    0.16
     detect
    0.16
     detected
    0.13
     detecting
    0.13
     Detect
    0.12
     detector
    0.12
     detects
    0.12
    detector
    0.12
    Detect
    0.12
    Act Density 0.025%

    No Known Activations