INDEX
    Explanations

    pronouns followed by specific contexts

    New Auto-Interp
    Negative Logits
     =
    0.64
     of
    0.63
     is
    0.60
    ü
    0.57
    ty
    0.55
     fumar
    0.54
    ya
    0.53
    var
    0.52
    om
    0.52
    ia
    0.52
    POSITIVE LOGITS
     basaltes
    0.54
     έχουν
    0.52
    ಸ್ಟ
    0.52
    0.50
     twierd
    0.50
    0.49
     जेसीबी
    0.48
     entraîne
    0.47
    𓍊
    0.47
    0.47
    Act Density 0.000%

    No Known Activations