INDEX
    Explanations

    adjectives and descriptive phrases that suggest contrast or complexity

    New Auto-Interp
    Negative Logits
    ViewFeatures
    -0.97
     itſelf
    -0.96
     houſe
    -0.93
     purpoſe
    -0.91
     Houſe
    -0.90
     Jefus
    -0.88
     ſche
    -0.85
     Conſ
    -0.84
     pleaſure
    -0.84
     Efq
    -0.83
    POSITIVE LOGITS
     nakalista
    0.55
     in
    0.55
     CWE
    0.55
     it
    0.52
    ,
    0.51
     since
    0.49
    !
    0.47
     ex
    0.46
     best
    0.46
     for
    0.46
    Act Density 0.511%

    No Known Activations