INDEX
    Explanations

    references to mirrors and reflections in various contexts, often relating to self-image or perceptions

    New Auto-Interp
    Negative Logits
    ]));
    
    -0.54
    geschlossen
    -0.50
    lieu
    -0.50
    `,
    
    -0.50
     "));
    -0.49
    ?";
    -0.49
    ruff
    -0.49
    enment
    -0.49
    ")));
    
    -0.48
    今から
    -0.48
    POSITIVE LOGITS
    Mirrors
    1.15
     Mirrors
    1.15
     mirror
    1.12
     Mirror
    1.11
     mirrors
    1.10
     espejo
    0.99
    Mirror
    0.96
     miroir
    0.96
     MIRROR
    0.95
     specchio
    0.95
    Act Density 0.287%

    No Known Activations