INDEX
    Explanations

    mentions of the word "popularity" with varying degrees of emphasis

    references to the concept of popularity

    New Auto-Interp
    Negative Logits
    erm
    -0.77
    INAL
    -0.73
    uran
    -0.73
     Dull
    -0.66
     intest
    -0.66
     Shell
    -0.65
     Fury
    -0.63
     Neurolog
    -0.61
    inis
    -0.61
    ¯¯¯¯
    -0.59
    POSITIVE LOGITS
    ately
    0.97
    ability
    0.90
    ously
    0.86
    Reviewer
    0.81
     quo
    0.76
    itious
    0.75
    itism
    0.73
    rise
    0.73
    uation
    0.71
    acy
    0.71
    Act Density 0.038%

    No Known Activations