INDEX
    Explanations

    phrases that start with "about," indicating a focus on topics of discussion or inquiry

    New Auto-Interp
    Negative Logits
    izable
    -0.17
    fault
    -0.16
    ities
    -0.15
    .scalablytyped
    -0.15
    argout
    -0.14
    и
    -0.13
    ัà¸ļม
    -0.13
    heim
    -0.13
    rot
    -0.13
    ãģ¨ãģĹãģ¦
    -0.13
    POSITIVE LOGITS
    /from
    0.21
    -NLS
    0.20
     طرÙĬÙĤ
    0.20
    -face
    0.17
    /to
    0.17
    lying
    0.17
    avia
    0.16
    /by
    0.16
    (predicate
    0.16
     how
    0.16
    Act Density 0.147%

    No Known Activations