INDEX
Explanations
names of individuals and their relationships to various actions or characteristics
New Auto-Interp
Negative Logits
elder
-0.16
eg
-0.15
zew
-0.15
inton
-0.15
[Int
-0.15
defgroup
-0.15
oop
-0.15
hte
-0.15
atch
-0.14
anagan
-0.14
POSITIVE LOGITS
STRICT
0.15
cação
0.14
duct
0.14
Quadr
0.14
Dw
0.13
à¹Īà¸ģ
0.13
'&&
0.13
OTHERWISE
0.13
ết
0.13
hope
0.13
Activations Density 0.087%