INDEX
    Explanations

    Howard followed by a name

    New Auto-Interp
    Negative Logits
    1
    1.93
    1.59
    4
    1.48
    5
    1.43
     
    1.35
    7
    1.34
    OL
    1.29
    した
    1.27
    ש
    1.27
    ни
    1.26
    POSITIVE LOGITS
    an
    1.89
    é
    1.50
    ;
    1.48
    ان
    1.46
    it
    1.42
    f
    1.31
    n
    1.30
    a
    1.27
    t
    1.27
    :
    1.25
    Act Density 0.005%

    No Known Activations