INDEX
Explanations
I'm sorry, but the provided text does not offer enough information for making a reliable analysis of what neuron 4 is looking for
terms related to political power and its dynamics
New Auto-Interp
Negative Logits
OM
-0.67
surprises
-0.63
OTOS
-0.60
omas
-0.58
Rodrig
-0.58
È
-0.57
conscience
-0.57
hift
-0.56
ovember
-0.56
love
-0.55
POSITIVE LOGITS
ping
1.72
ped
1.63
ps
1.46
pers
1.43
py
1.39
p
1.37
per
1.32
pel
1.29
pit
1.27
pter
1.24
Activations Density 0.084%