INDEX
Explanations
moments or events related to personal achievements or significant experiences
New Auto-Interp
Negative Logits
OTAL
-0.55
ogether
-0.54
halla
-0.53
common
-0.52
collectively
-0.52
results
-0.51
earch
-0.51
abee
-0.51
selves
-0.50
respectively
-0.49
POSITIVE LOGITS
himself
1.05
Himself
0.82
his
0.81
itone
0.61
wife
0.61
beard
0.60
persona
0.60
girlfriend
0.60
career
0.59
resign
0.58
Activations Density 1.053%