INDEX
Explanations
mentions of personal experiences or encounters
mentions of personal experiences
New Auto-Interp
Negative Logits
upper
-0.71
cial
-0.66
cut
-0.62
trump
-0.62
ump
-0.62
decimal
-0.61
mask
-0.60
putable
-0.60
upp
-0.60
Correction
-0.59
POSITIVE LOGITS
experiences
3.81
experience
2.49
experien
2.02
Experience
1.86
encounters
1.82
exper
1.76
Exper
1.75
Experience
1.75
journeys
1.62
memories
1.54
Activations Density 0.007%