INDEX
Explanations
phrases related to personal stories or experiences
mentions of personal experiences
New Auto-Interp
Negative Logits
sub
-0.73
hatt
-0.71
vous
-0.66
tumor
-0.64
cut
-0.62
apo
-0.62
law
-0.61
tra
-0.61
cise
-0.61
Sabha
-0.61
POSITIVE LOGITS
experiences
1.22
Experience
0.97
iences
0.97
experien
0.95
experience
0.91
Experience
0.85
ttes
0.82
Exper
0.82
OWS
0.80
ivities
0.79
Activations Density 0.016%