INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Bennett
    -0.08
    _BEFORE
    -0.07
    도로
    -0.06
    -0.06
    owania
    -0.06
     cocaine
    -0.06
     пись
    -0.06
    ФЛ
    -0.06
     expres
    -0.06
     Computers
    -0.06
    POSITIVE LOGITS
     myth
    0.14
     Myth
    0.13
     myths
    0.11
     mythical
    0.08
    .Q
    0.07
     individually
    0.07
     misinformation
    0.07
     MT
    0.07
    MT
    0.07
    UTILITY
    0.06
    Act Density 0.004%

    No Known Activations