INDEX
Explanations
terms related to the brain
references to the brain
New Auto-Interp
Negative Logits
Faust
-0.65
Bundy
-0.65
Dialog
-0.64
Bowie
-0.64
Arabia
-0.64
nesday
-0.63
Voters
-0.63
adoes
-0.63
Chamberlain
-0.62
riott
-0.62
POSITIVE LOGITS
stem
1.34
washed
1.32
washing
1.22
wash
1.14
iac
1.07
waves
1.03
storms
0.99
fuck
0.95
dead
0.91
wave
0.91
Activations Density 0.034%