INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    -0.44
     taste
    -0.43
     temporary
    -0.42
    taste
    -0.42
    -0.40
    -0.40
    temporary
    -0.39
    styleType
    -0.38
     disemb
    -0.38
    -0.37
    POSITIVE LOGITS
     late
    0.64
     EconPapers
    0.62
     Late
    0.62
    Late
    0.60
    曖昧さ回避
    0.59
    late
    0.57
     landscape
    0.55
     healthcare
    0.54
     Healthcare
    0.51
     autorytatywna
    0.51
    Act Density 0.139%

    No Known Activations