INDEX
    Explanations

    topic + noun/descriptor

    New Auto-Interp
    Negative Logits
     in
    0.88
    ש
    0.87
    dengan
    0.76
    いっぱい
    0.73
    ata
    0.71
    ال
    0.71
    İ
    0.70
    dır
    0.67
    א
    0.66
     as
    0.65
    POSITIVE LOGITS
    5
    0.87
    4
    0.77
    н
    0.74
    )
    0.72
    0.68
    :
    0.68
    ٥
    0.67
    0.67
    ۵
    0.66
    0.66
    Act Density 0.045%

    No Known Activations