INDEX
Explanations
actions related to financial transactions and social interactions
actions and behaviors related to social connections and interactions
New Auto-Interp
Negative Logits
croft
-0.76
alon
-0.69
appa
-0.68
romeda
-0.68
sama
-0.68
ç«
-0.66
Parameter
-0.66
affer
-0.65
çͰ
-0.65
718
-0.65
POSITIVE LOGITS
themselves
1.11
their
0.98
Their
0.86
THEIR
0.85
collectively
0.80
diets
0.79
increasingly
0.78
wives
0.74
their
0.73
extinct
0.73
Activations Density 0.670%