INDEX
    Explanations

    terms following specific words

    New Auto-Interp
    Negative Logits
    Па
    0.90
    <eos>
    0.84
    На
    0.83
    Мо
    0.83
    Та
    0.82
    По
    0.82
    За
    0.80
    Ви
    0.78
    Не
    0.78
    До
    0.76
    POSITIVE LOGITS
     fundraising
    1.13
     laryng
    1.02
     är
    1.01
     antisemit
    1.01
     è
    1.00
     softball
    0.99
     melakukan
    0.98
     talks
    0.98
     sarà
    0.98
     outperformed
    0.97
    Act Density 0.001%

    No Known Activations