INDEX
Explanations
less than, eliminating, maximum distance, movement
New Auto-Interp
Negative Logits
Vista
0.44
ᖃ
0.43
Avatar
0.42
Precio
0.41
Carlton
0.40
ить
0.40
Cookie
0.40
Liberty
0.40
Assassin
0.40
Ꮨ
0.40
POSITIVE LOGITS
sujet
0.52
estrange
0.48
anarch
0.46
explan
0.46
antif
0.46
sujeito
0.46
hinzuf
0.46
subjet
0.44
auxili
0.44
including
0.44
Activations Density 0.002%