INDEX
    Explanations

    phrases that express opinions or perspectives

    New Auto-Interp
    Negative Logits
     Probe
    -0.15
    YK
    -0.15
    ptions
    -0.14
    ãĤ¢ãĥ¼
    -0.14
     Barcl
    -0.14
    à¥ģह
    -0.13
    ären
    -0.13
    porto
    -0.13
    ctors
    -0.13
    eba
    -0.13
    POSITIVE LOGITS
    infer
    0.16
    IFF
    0.15
    meter
    0.14
     Pastor
    0.14
    icho
    0.14
     комÑĥ
    0.14
    ê¶Į
    0.14
    ùi
    0.14
    å¨ĺ
    0.13
    ÑĤÑİ
    0.13
    Act Density 0.022%

    No Known Activations