INDEX
Explanations
proper names, specifically focusing on the name "Erik" and "Felix" with various numbers associated with the activations
the name "Erik" and related prominent names in the context of specific events or actions
New Auto-Interp
Negative Logits
schild
-0.78
Decay
-0.77
ACTIONS
-0.72
tool
-0.71
roads
-0.68
Gh
-0.68
scape
-0.67
MX
-0.67
MQ
-0.66
align
-0.66
POSITIVE LOGITS
ongyang
0.87
yip
0.83
abba
0.78
nesday
0.77
vernment
0.76
Ń·
0.76
doms
0.75
irit
0.74
rious
0.73
jri
0.73
Activations Density 0.060%