INDEX
    Explanations

    comparisons between different options or choices

    comparisons between two items or concepts

    New Auto-Interp
    Negative Logits
    olog
    -0.88
    ERN
    -0.80
    mberg
    -0.78
    shire
    -0.76
    overed
    -0.74
    lied
    -0.72
    unes
    -0.72
    Synopsis
    -0.70
    seed
    -0.70
    ocratic
    -0.69
    POSITIVE LOGITS
     mindset
    0.68
    hill
    0.66
    pecting
    0.63
     averages
    0.62
     scarcity
    0.62
     linear
    0.62
     bandits
    0.61
     nil
    0.60
     expend
    0.60
     underdog
    0.59
    Act Density 0.016%

    No Known Activations