INDEX
    Explanations

    specific mentions of the word "gorillas"

    New Auto-Interp
    Negative Logits
     Hath
    -0.76
    çĦ
    -0.72
     pree
    -0.68
    NC
    -0.65
    ALE
    -0.64
    Adv
    -0.64
     Disability
    -0.63
    ŀ
    -0.63
    nder
    -0.62
    tie
    -0.62
    POSITIVE LOGITS
    illas
    1.48
    terday
    1.09
    unta
    0.90
    cules
    0.88
    xon
    0.83
    uca
    0.83
    ques
    0.83
    ervatives
    0.82
    ervative
    0.79
    emonium
    0.78
    Act Density 0.006%

    No Known Activations