INDEX
    Explanations

    the term "actual" or variations of it in different contexts

    New Auto-Interp
    Negative Logits
     Beware
    -0.75
    wich
    -0.75
    zy
    -0.69
     Azerb
    -0.68
    nan
    -0.66
    Gate
    -0.63
     surely
    -0.63
    ervative
    -0.62
    limit
    -0.62
     Vaugh
    -0.62
    POSITIVE LOGITS
    ity
    1.00
    isation
    0.99
    izations
    0.98
    izable
    0.97
    ities
    0.91
    ignment
    0.88
    isations
    0.88
    idad
    0.86
    ITY
    0.85
     malice
    0.82
    Act Density 0.015%

    No Known Activations