INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    0.70
    ו
    0.60
    ות
    0.58
    u
    0.54
    ק
    0.49
    на
    0.47
    ad
    0.47
    ח
    0.44
    ט
    0.44
     For
    0.44
    POSITIVE LOGITS
     nebo
    0.46
    \
    0.44
     falam
    0.42
     agua
    0.41
     frio
    0.40
    fläche
    0.40
     čo
    0.39
    0.39
    ாவின்
    0.39
     لكن
    0.39
    Act Density 0.731%

    No Known Activations