INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    im
    0.79
    utt
    0.79
    ot
    0.76
    on
    0.75
    ase
    0.74
    re
    0.74
    in
    0.74
    u
    0.73
    ph
    0.72
    о
    0.72
    POSITIVE LOGITS
     camaraderie
    0.82
     badass
    0.75
     whimsical
    0.75
     idyllic
    0.73
     populist
    0.73
     prosperity
    0.71
     tenets
    0.71
     mindset
    0.70
     patriotic
    0.70
     benevolent
    0.70
    Act Density 0.000%

    No Known Activations