INDEX
Explanations
punctuations and their surrounding contextual phrases
New Auto-Interp
Negative Logits
deps
-0.15
olvers
-0.15
737
-0.15
InView
-0.15
uling
-0.14
965
-0.14
orman
-0.14
alla
-0.14
oi
-0.14
ixin
-0.13
POSITIVE LOGITS
everything
0.21
everything
0.20
Everything
0.20
Everything
0.20
things
0.17
traction
0.16
NOTHING
0.15
Things
0.15
temperatures
0.15
tudo
0.15
Activations Density 0.019%