INDEX
Explanations
instances of the word "discover" along with phrases indicating exploration or finding new things
New Auto-Interp
Negative Logits
ansom
-0.19
enden
-0.15
gers
-0.15
pel
-0.15
uate
-0.14
birds
-0.14
ronics
-0.14
ele
-0.14
ker
-0.14
owns
-0.14
POSITIVE LOGITS
673
0.15
iÃŁ
0.14
озмож
0.14
IMPLEMENT
0.14
sole
0.14
æĿ¥ãģŁ
0.14
DidLoad
0.13
Touches
0.13
å¼Ł
0.13
oku
0.13
Activations Density 0.011%