INDEX
    Explanations

    rankings or lists of items within different categories

    phrases indicating rankings or lists of top items

    New Auto-Interp
    Negative Logits
    norm
    -0.84
    icum
    -0.83
    redo
    -0.83
    roth
    -0.81
    protection
    -0.76
    arantine
    -0.75
    limited
    -0.75
    amar
    -0.74
    athered
    -0.74
    amination
    -0.73
    POSITIVE LOGITS
    Favorite
    1.02
     Worst
    0.86
     Influ
    0.85
     quotes
    0.79
     celeb
    0.76
     Bucket
    0.76
     unsolved
    0.76
     Celebrity
    0.76
     Places
    0.75
     Songs
    0.74
    Act Density 0.305%

    No Known Activations