INDEX
    Explanations

    phrases that express specific quantities or degrees of actions or feelings

    New Auto-Interp
    Negative Logits
    icker
    -0.15
    fad
    -0.15
    TestCategory
    -0.14
    ĽĪ
    -0.14
    :///
    -0.14
    ÑĤаÑħ
    -0.14
    agal
    -0.14
    yon
    -0.14
    師
    -0.14
    agogue
    -0.13
    POSITIVE LOGITS
     liking
    0.27
     cue
    0.26
     cues
    0.25
     beating
    0.25
     stance
    0.24
     toll
    0.23
     risks
    0.23
     step
    0.23
     look
    0.22
     baths
    0.22
    Act Density 0.058%

    No Known Activations