INDEX
    Explanations

    words related to lists or items in a list

    New Auto-Interp
    Negative Logits
     reluct
    -0.67
     Godt
    -0.66
     horrend
    -0.66
     unspeak
    -0.65
     spartan
    -0.64
     cuck
    -0.62
     spind
    -0.60
     enthusi
    -0.59
     celtic
    -0.59
     apprehen
    -0.59
    POSITIVE LOGITS
     •
    0.87
    0.82
    .•
    0.71
    )•
    0.69
    ••
    0.68
    ("")]
    
    0.64
    °•
    0.61
     ·
    0.58
    ~•
    0.58
    (::
    0.57
    Act Density 0.108%

    No Known Activations