INDEX
Explanations
objects becoming something else
New Auto-Interp
Negative Logits
ते
0.41
concern
0.39
觥
0.38
rophes
0.38
arti
0.37
ర్థి
0.37
ells
0.37
ibold
0.37
𒋗
0.37
hace
0.37
POSITIVE LOGITS
Clone
0.46
cloning
0.45
lup
0.44
lod
0.44
Clone
0.43
cloned
0.43
eyes
0.42
Lup
0.42
mystery
0.41
кло
0.41
Activations Density 0.000%