INDEX
Explanations
motivations and actions related to indulgence, potentially in a negative or excessive manner
New Auto-Interp
Negative Logits
è£ħ
-0.81
prototype
-0.74
ĺħ
-0.73
Shack
-0.69
ACP
-0.69
Hop
-0.68
PLA
-0.68
hair
-0.68
OHN
-0.67
\\\\\\\\
-0.66
POSITIVE LOGITS
atory
1.35
gence
1.18
gments
1.10
ation
1.07
iment
1.06
icates
1.00
iments
1.00
ication
1.00
ices
1.00
ences
1.00
Activations Density 0.057%