INDEX
Explanations
references to whispering or related forms of quiet communication
whispered speech
New Auto-Interp
Negative Logits
Carls
-0.50
Carl
-0.47
Sega
-0.47
Tog
-0.46
a
-0.45
Rango
-0.44
Carl
-0.44
Atari
-0.42
Suit
-0.42
Trig
-0.42
POSITIVE LOGITS
whisper
1.87
whispering
1.73
whispers
1.70
Whisper
1.70
whisper
1.69
Whisper
1.63
whispered
1.59
Whis
1.30
whis
1.28
Whis
1.27
Activations Density 0.001%