INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    0.52
     Москвы
    0.45
    0.45
     postaje
    0.44
    ק
    0.44
    0.44
     Colored
    0.44
    пь
    0.44
     postérieure
    0.43
    IN
    0.42
    POSITIVE LOGITS
    ner
    0.49
     haben
    0.46
     troubling
    0.45
    iler
    0.45
    roller
    0.45
     sullen
    0.45
     τ
    0.45
     blatant
    0.45
     wenn
    0.43
     para
    0.43
    Act Density 0.046%

    No Known Activations