INDEX
Explanations
phrases related to behaviors or inclinations
phrases indicating a recurrent behavioral inclination
New Auto-Interp
Negative Logits
mberg
-0.83
gur
-0.80
cutting
-0.78
hemat
-0.75
idden
-0.74
umm
-0.74
enic
-0.74
riel
-0.73
loo
-0.71
sung
-0.70
POSITIVE LOGITS
tendency
1.20
tendencies
1.06
propensity
0.96
inconsistency
0.92
suscept
0.82
inference
0.81
aversion
0.80
inclination
0.79
predis
0.79
toward
0.75
Activations Density 0.008%