INDEX
Explanations
repetitive phrases or contextually significant markers in the text
New Auto-Interp
Negative Logits
uz
-0.19
idis
-0.15
uses
-0.14
ther
-0.14
and
-0.14
ter
-0.14
ader
-0.14
icon
-0.13
Cave
-0.13
uv
-0.13
POSITIVE LOGITS
alone
0.22
is
0.21
fact
0.20
situation
0.17
has
0.17
fact
0.17
isa
0.17
trend
0.16
latter
0.16
itself
0.16
Activations Density 0.182%