INDEX
Explanations
instances of familial relationships and social dynamics
New Auto-Interp
Negative Logits
orian
-0.19
æĢĸ
-0.14
INUX
-0.14
.ret
-0.14
ells
-0.14
iÄĻ
-0.14
kinson
-0.13
izo
-0.13
lest
-0.13
Wich
-0.13
POSITIVE LOGITS
ignore
0.45
ignored
0.45
disreg
0.44
defiance
0.44
disob
0.44
disregard
0.43
ignoring
0.43
ignores
0.40
def
0.38
ignore
0.38
Activations Density 0.317%