INDEX
Explanations
references to specific locations or people with the term "sa" in them
references to a specific individual, likely a prominent figure
New Auto-Interp
Negative Logits
furious
-0.69
Hex
-0.67
neck
-0.66
Furious
-0.65
degener
-0.63
interactions
-0.63
met
-0.62
empath
-0.62
gears
-0.61
batter
-0.60
POSITIVE LOGITS
sa
4.51
si
1.63
sam
1.49
SA
1.41
sin
1.34
sha
1.32
sal
1.29
Sa
1.27
sb
1.24
sg
1.21
Activations Density 0.007%