INDEX
    Explanations

    mentions of the name "Karl" with varying levels of activation

    New Auto-Interp
    Negative Logits
    flix
    -0.77
    ndra
    -0.71
     NX
    -0.69
     Dangerous
    -0.67
     Called
    -0.66
    nder
    -0.64
    LV
    -0.63
     Doct
    -0.62
     Mub
    -0.61
     Karma
    -0.60
    POSITIVE LOGITS
    ounge
    1.15
    anguage
    1.10
    ophone
    1.02
    owship
    1.00
    ength
    0.94
    oths
    0.91
    ibrary
    0.91
    ottesville
    0.91
    otta
    0.90
    atan
    0.89
    Act Density 0.019%

    No Known Activations