INDEX
    Explanations

    mentions of famous individuals or specific topics in various fields, potentially related to current events

    proper nouns and specific entities, often in the context of questions or discussions about them

    New Auto-Interp
    Negative Logits
    ggles
    -0.89
    details
    -0.77
    edIn
    -0.72
    çīĪ
    -0.69
    roups
    -0.68
    çļ
    -0.68
    irts
    -0.67
    ":"","
    -0.67
    ook
    -0.66
    udes
    -0.65
    POSITIVE LOGITS
     supposed
    1.27
     gonna
    1.06
     worth
    1.01
     able
    1.00
     really
    1.00
     contagious
    0.98
     ready
    0.97
     going
    0.96
     aware
    0.95
     REALLY
    0.94
    Act Density 0.114%

    No Known Activations