INDEX
    Explanations

    latex and bibtex citations

    New Auto-Interp
    Negative Logits
    (
    0.57
     (
    0.56
     κά
    0.56
    0.52
     protr
    0.49
     satu
    0.48
     η
    0.48
     denominador
    0.48
     וכ
    0.48
     στις
    0.46
    POSITIVE LOGITS
    ك
    0.73
    خ
    0.66
    на
    0.65
    ের
    0.65
    ни
    0.62
    in
    0.61
    3
    0.61
    இல்
    0.59
    و
    0.58
    ın
    0.57
    Act Density 0.000%

    No Known Activations