INDEX
Explanations
phrases that indicate enjoyment or fun
instances of the word "fun" in various contexts
New Auto-Interp
Negative Logits
bottleneck
-0.72
Centauri
-0.69
BOOK
-0.63
attle
-0.62
underest
-0.61
proport
-0.61
obook
-0.59
Transparency
-0.59
helicop
-0.59
defective
-0.59
POSITIVE LOGITS
nels
1.57
gal
1.14
eral
1.09
nell
1.09
nel
1.02
ctor
0.93
ctory
0.86
ilee
0.86
ctions
0.83
imation
0.83
Activations Density 0.025%