INDEX
    Explanations

    words related to detecting or finding clues

    New Auto-Interp
    Negative Logits
    ategory
    -0.83
    oslav
    -0.70
    rior
    -0.68
    atism
    -0.68
    fare
    -0.66
    kus
    -0.64
    rifice
    -0.62
     neighb
    -0.62
    È
    -0.61
    lav
    -0.60
    POSITIVE LOGITS
     clue
    1.16
     hint
    1.14
     clues
    1.10
     hints
    0.98
     glean
    0.95
    tale
    0.71
     illuminate
    0.70
     pointing
    0.70
     glimps
    0.69
    ibly
    0.69
    Act Density 0.033%

    No Known Activations