INDEX
Explanations
concepts relating to reflection and introspection
New Auto-Interp
Negative Logits
енка
-0.07
nesc
-0.07
terra
-0.07
ephy
-0.07
arkin
-0.07
ovnÃŃ
-0.07
mods
-0.07
ymes
-0.07
æ¯
-0.07
tera
-0.06
POSITIVE LOGITS
reflection
0.14
Reflection
0.13
reflections
0.13
reflection
0.13
mirrors
0.13
mirror
0.13
Reflection
0.13
Mirror
0.12
reflected
0.12
reflect
0.11
Activations Density 0.021%