INDEX
    Explanations

    references to goals, aims, and intentions within the text

    New Auto-Interp
    Negative Logits
    ossal
    -0.17
    lf
    -0.15
    uden
    -0.15
    681
    -0.15
    culus
    -0.15
    erves
    -0.14
    plit
    -0.14
    алÑĥ
    -0.14
    165
    -0.14
    conom
    -0.14
    POSITIVE LOGITS
    egot
    0.19
    IPA
    0.16
    maları
    0.16
    ewe
    0.15
    íķ
    0.15
    ivor
    0.15
    िव
    0.14
    lest
    0.14
    иÑĤом
    0.14
    Tro
    0.14
    Act Density 0.143%

    No Known Activations