INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     fear
    -1.19
     feared
    -1.12
    fear
    -1.11
     fearing
    -1.09
     afraid
    -1.09
     fearful
    -1.09
     Fear
    -1.08
    Fear
    -1.08
    RegressionTest
    -1.08
     fears
    -0.99
    POSITIVE LOGITS
    ful
    0.75
     of
    0.60
    y
    0.55
    fully
    0.47
    halb
    0.45
    ver
    0.44
     about
    0.43
    full
    0.43
    hof
    0.43
    ure
    0.42
    Act Density 0.029%

    No Known Activations