INDEX
    Explanations

    noticing unusual or surprising things

    New Auto-Interp
    Negative Logits
     unfortunately
    0.62
     malheureusement
    0.61
    Unfortunately
    0.57
    heureusement
    0.54
     Unfortunately
    0.54
    unfortunately
    0.53
     aufgrund
    0.50
     fortunately
    0.49
    Sadly
    0.49
     दुर्भाग्य
    0.48
    POSITIVE LOGITS
    明明
    0.73
     seem
    0.70
     seemingly
    0.67
    seem
    0.66
     none
    0.64
     почти
    0.64
     semblent
    0.63
     Neither
    0.63
     despite
    0.61
    好像
    0.61
    Act Density 0.006%

    No Known Activations