INDEX
    Explanations

    mentions of specific people or public figures, particularly on social media

    New Auto-Interp
    Negative Logits
    ''.
    -0.87
    âĶĢâĶĢ
    -0.76
    ".[
    -0.76
    .).
    -0.75
    .",
    -0.74
    ]."
    -0.74
    ).[
    -0.73
    .ãĢį
    -0.71
    .�
    -0.71
    �
    -0.69
    POSITIVE LOGITS
     @
    1.11
    Jr
    1.05
     Originally
    0.97
     congr
    0.93
     Thanks
    0.87
    steen
    0.87
    afort
    0.84
    why
    0.83
     Yep
    0.83
    _
    0.82
    Act Density 0.075%

    No Known Activations