INDEX
    Explanations

    mentions or references to specific individuals

    New Auto-Interp
    Negative Logits
    cffff
    -0.74
    iller
    -0.71
    ilon
    -0.69
    avorite
    -0.67
     whiff
    -0.66
    oppable
    -0.62
     resisted
    -0.61
     resistance
    -0.60
    é¾įå
    -0.58
    cape
    -0.58
    POSITIVE LOGITS
    irect
    0.86
    reference
    0.79
    rers
    0.79
    rences
    0.78
    itatively
    0.75
    minist
    0.73
     Reference
    0.73
    entious
    0.71
    ename
    0.71
    ãĥĥ
    0.70
    Act Density 0.765%

    No Known Activations