INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    bilt
    -0.78
    âĹ¼
    -0.75
     Nanto
    -0.72
    soType
    -0.70
     Warning
    -0.69
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    -0.68
    CLASSIFIED
    -0.68
    ãĥĵ
    -0.68
    ãģ®ç
    -0.67
    interstitial
    -0.67
    POSITIVE LOGITS
     mere
    0.96
     merely
    0.90
     simply
    0.90
     superficial
    0.84
     simple
    0.79
     oneself
    0.76
     partisans
    0.73
     brute
    0.73
     aesthetics
    0.72
     cosmetic
    0.72
    Act Density 0.105%

    No Known Activations