INDEX
    Explanations

    words related to behavioral observations and changes

    New Auto-Interp
    Negative Logits
     Laurens
    -0.84
     Nip
    -0.79
     Milne
    -0.78
     fairest
    -0.71
    tigma
    -0.70
     Donne
    -0.70
     Tup
    -0.69
     Lons
    -0.69
    risten
    -0.69
     glomer
    -0.69
    POSITIVE LOGITS
     behavior
    1.80
     behaviour
    1.69
     behaviors
    1.60
     Behavior
    1.57
    behavior
    1.56
     behaviours
    1.51
     BEHAVIOR
    1.47
    behaviour
    1.44
    Behavior
    1.43
     Behaviour
    1.42
    Act Density 0.117%

    No Known Activations