INDEX
Explanations
references to the concept of "world" across various contexts
New Auto-Interp
Negative Logits
ugier
-0.44
Clever
-0.42
uable
-0.41
복
-0.41
colas
-0.41
prepared
-0.40
⇑
-0.40
ingenious
-0.40
thông
-0.40
Safe
-0.40
POSITIVE LOGITS
world
0.81
sphere
0.81
mundos
0.80
orbit
0.77
wereld
0.76
world
0.75
sphere
0.74
worlds
0.72
esferas
0.72
worlds
0.71
Activations Density 0.227%