INDEX
Explanations
mentions of the word "brain"
New Auto-Interp
Negative Logits
Bundy
-0.73
nesday
-0.70
FANTASY
-0.69
Arabia
-0.66
Bowie
-0.66
adoes
-0.65
impunity
-0.65
Faust
-0.63
Mex
-0.62
Riy
-0.62
POSITIVE LOGITS
stem
1.38
washed
1.26
washing
1.18
wash
1.11
iac
1.07
waves
0.99
fuck
0.92
dead
0.89
storms
0.89
caps
0.87
Activations Density 0.019%