INDEX
    Explanations

    phrases related to a specific organization or brand

    references to specific organizations or entities, particularly related to academia or research

    New Auto-Interp
    Negative Logits
     panc
    -0.84
     Hillary
    -0.72
     soup
    -0.72
     Columbus
    -0.68
     Chop
    -0.68
     Bos
    -0.67
     Dunk
    -0.65
     Vers
    -0.65
     Tunis
    -0.63
     boiled
    -0.63
    POSITIVE LOGITS
    RL
    4.68
    rl
    1.63
     RL
    1.62
    RS
    1.44
    JR
    1.33
    LR
    1.31
    RC
    1.30
    SL
    1.25
    LL
    1.22
    ARA
    1.18
    Act Density 0.010%

    No Known Activations