INDEX
    Explanations

    correct answer or sentence

    New Auto-Interp
    Negative Logits
    0.98
    ){
    0.94
     thiab
    0.93
    CT
    0.93
     histórias
    0.91
    \">
    0.88
    );
    0.88
    "/"
    0.87
    0.87
     erweitert
    0.86
    POSITIVE LOGITS
    as
    1.17
    1.06
    man
    1.02
    b
    1.02
    on
    1.00
    ма
    0.95
    ى
    0.93
    il
    0.92
    ம்
    0.90
    y
    0.88
    Act Density 0.403%

    No Known Activations