INDEX
Explanations
phrases related to decision-making or taking action
patterns of conditional statements and outcomes
New Auto-Interp
Negative Logits
gross
-0.75
çͰ
-0.68
uca
-0.68
ritz
-0.67
aido
-0.66
Rap
-0.66
acts
-0.65
Rot
-0.64
ollo
-0.63
uga
-0.62
POSITIVE LOGITS
mentality
0.85
syndrome
0.82
attitude
0.77
salesman
0.74
erness
0.69
lihood
0.67
-'
0.67
trope
0.66
sequel
0.66
ratio
0.66
Activations Density 0.178%