INDEX
    Explanations

    phrases indicating approval or praise

    phrases indicating a positive assessment or state of being

    New Auto-Interp
    Negative Logits
    hyde
    -0.91
    rush
    -0.75
    atto
    -0.72
    ategory
    -0.71
    hip
    -0.71
    ataka
    -0.68
     furiously
    -0.66
    illary
    -0.64
    iferation
    -0.63
    ngth
    -0.63
    POSITIVE LOGITS
    enough
    1.10
     suited
    1.00
     enough
    0.98
     behaved
    0.92
    spring
    0.91
    baum
    0.80
    wired
    0.76
     Enough
    0.76
     positioned
    0.76
    Known
    0.76
    Act Density 0.041%

    No Known Activations