INDEX
    Explanations

    model starting response

    New Auto-Interp
    Negative Logits
    Hidden
    0.41
    commercial
    0.39
    Conceptual
    0.38
    0.38
    Please
    0.38
     Software
    0.38
     commerciali
    0.38
     acceptor
    0.37
    Annex
    0.37
     comerciais
    0.37
    POSITIVE LOGITS
     absolutely
    0.43
     отлично
    0.41
     отлич
    0.41
     odlič
    0.40
    тный
    0.40
     абсолютно
    0.37
     Bauern
    0.37
    etics
    0.37
     मिक्स
    0.37
    Michelle
    0.37
    Act Density 0.105%

    No Known Activations