INDEX
    Explanations

    phrases related to depth or intensity

    New Auto-Interp
    Negative Logits
     Slag
    -0.56
     encomp
    -0.54
     apprehen
    -0.54
     affor
    -0.52
     attemp
    -0.49
     Böh
    -0.49
     Blat
    -0.47
     erad
    -0.46
    crus
    -0.46
     osu
    -0.46
    POSITIVE LOGITS
     deep
    1.14
     Deep
    1.10
    Deep
    1.10
    deep
    1.09
     DEEP
    1.01
     depths
    0.96
    DEEP
    0.95
    depth
    0.94
     depth
    0.93
     deeper
    0.92
    Act Density 0.102%

    No Known Activations