INDEX
Explanations
the word "The" in various contexts
New Auto-Interp
Negative Logits
Viewer
-0.15
nda
-0.14
ours
-0.14
nger
-0.14
мож
-0.14
uters
-0.13
ctors
-0.13
agger
-0.13
едагог
-0.13
½
-0.12
POSITIVE LOGITS
orem
0.15
rone
0.15
-fixed
0.15
галÑĸ
0.15
gangbang
0.14
eil
0.14
kening
0.14
teil
0.14
Huck
0.14
ÑįкÑģплÑĥаÑĤа
0.14
Activations Density 0.376%