INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Anliegen
    -0.08
    ’us
    -0.08
    -0.08
    Slot
    -0.08
    口诀
    -0.07
     Stich
    -0.07
    -wire
    -0.07
     rubbing
    -0.07
     rubbed
    -0.07
     Slot
    -0.07
    POSITIVE LOGITS
    traditional
    0.10
     oppressive
    0.09
    0.09
     capitalism
    0.09
     기존
    0.09
    传统
    0.08
     traditional
    0.08
     सरकार
    0.08
     SMA
    0.08
     traditionele
    0.08
    Act Density 0.038%

    No Known Activations