INDEX
Explanations
subject and object definitions
New Auto-Interp
Negative Logits
disruptive
0.43
transgress
0.43
émotion
0.42
antisocial
0.42
原材料
0.41
✸
0.41
zákaz
0.40
kis
0.40
televisión
0.39
mascotas
0.39
POSITIVE LOGITS
Determin
0.50
ed
0.46
world
0.45
Planned
0.43
Disk
0.42
Control
0.42
Heaven
0.41
Begin
0.41
Heavenly
0.41
Request
0.40
Activations Density 0.011%