INDEX
    Explanations

    terms related to factual information or reality

    New Auto-Interp
    Negative Logits
     Beware
    -0.76
    wich
    -0.70
    zy
    -0.68
     Azerb
    -0.68
     surely
    -0.67
    limit
    -0.66
    Gate
    -0.64
     wisely
    -0.64
    fu
    -0.63
    nan
    -0.61
    POSITIVE LOGITS
    ity
    1.06
    izable
    1.04
    izations
    1.03
    isation
    0.99
    ities
    0.92
    idad
    0.91
    ITY
    0.90
    ignment
    0.88
    isations
    0.88
    ization
    0.87
    Act Density 0.073%

    No Known Activations