INDEX
Explanations
the word "fun" or similar variations indicating enjoyment or pleasure
references to enjoyment or amusement
New Auto-Interp
Negative Logits
Hoover
-0.69
Gork
-0.68
underest
-0.68
Hawth
-0.66
Hein
-0.64
Canter
-0.64
fright
-0.62
Horton
-0.61
crush
-0.60
ib
-0.60
POSITIVE LOGITS
ctions
1.68
func
1.11
fun
1.07
ancial
1.05
eral
1.00
rontal
0.92
cture
0.90
ctory
0.90
enges
0.88
aunder
0.87
Activations Density 0.014%