INDEX
    Explanations

    prefix-suffix word construction

    New Auto-Interp
    Negative Logits
    on
    0.93
    0.80
     was
    0.79
    0.77
    an
    0.75
    at
    0.69
    ap
    0.64
     كان
    0.64
     landen
    0.62
    0.60
    POSITIVE LOGITS
    2
    0.75
    3
    0.75
    7
    0.66
    9
    0.66
    0
    0.64
    8
    0.62
    1
    0.61
    6
    0.58
    4
    0.57
    den
    0.54
    Act Density 1.440%

    No Known Activations