INDEX
    Explanations

    phrases indicating attempts or actions involving the word "trying" followed by a verb

    New Auto-Interp
    Negative Logits
    RegressionTest
    -1.02
    LookAnd
    -0.94
     متعلقه
    -0.94
    Geplaatst
    -0.85
    WireFormatLite
    -0.84
     betweenstory
    -0.83
    PreferredItem
    -0.82
     ویکی‌پدی
    -0.81
    Попис
    -0.80
     estekak
    -0.78
    POSITIVE LOGITS
     particularly
    0.47
     significance
    0.45
     importance
    0.45
     gus
    0.45
    восто
    0.44
    ستون
    0.44
     öne
    0.44
     frozen
    0.42
     to
    0.42
     useHistory
    0.41
    Act Density 0.012%

    No Known Activations