INDEX
Explanations
references to mirrors and reflections in various contexts, often relating to self-image or perceptions
New Auto-Interp
Negative Logits
]));
-0.54
geschlossen
-0.50
lieu
-0.50
`,
-0.50
"));
-0.49
?";
-0.49
ruff
-0.49
enment
-0.49
")));
-0.48
今から
-0.48
POSITIVE LOGITS
Mirrors
1.15
Mirrors
1.15
mirror
1.12
Mirror
1.11
mirrors
1.10
espejo
0.99
Mirror
0.96
miroir
0.96
MIRROR
0.95
specchio
0.95
Activations Density 0.287%