INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    2.08
    are
    1.98
    il
    1.92
    1.88
    1.87
    ור
    1.74
    ل
    1.73
    です
    1.71
    1.70
    เรา
    1.69
    POSITIVE LOGITS
    Fonte
    1.93
    pronged
    1.89
    GRAM
    1.82
    boiled
    1.82
    ácter
    1.80
    ौनक
    1.80
    1.75
    GRA
    1.74
    GING
    1.74
    на
    1.73
    Act Density 0.110%

    No Known Activations