INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rozvoj
    -0.07
     DEN
    -0.07
    idden
    -0.07
     Standing
    -0.06
    elerinde
    -0.06
    _finder
    -0.06
     Kromě
    -0.06
     плен
    -0.06
     Shawn
    -0.06
     відмов
    -0.06
    POSITIVE LOGITS
     Mercury
    0.13
     mercury
    0.09
    cury
    0.09
    heard
    0.08
     thermometer
    0.07
    Sorted
    0.07
    /gui
    0.07
    Vectorizer
    0.06
     echoes
    0.06
    herent
    0.06
    Act Density 0.002%

    No Known Activations