INDEX
    Explanations

    inferring implications and nuances

    New Auto-Interp
    Negative Logits
     ایم
    0.47
    说法
    0.45
    めの
    0.45
    funny
    0.44
    ytra
    0.44
     éditions
    0.43
     око
    0.42
     بح
    0.42
    pha
    0.42
     enjo
    0.42
    POSITIVE LOGITS
    і
    0.50
    ъ
    0.50
     Castile
    0.46
     Grec
    0.45
    и
    0.45
     classique
    0.43
    undance
    0.42
     tbl
    0.42
    𝐢
    0.42
    0.41
    Act Density 0.002%

    No Known Activations