INDEX
Explanations
phrases indicating qualifications or conditions related to actions or events
instances of the token "<bos>"
as, since, although
New Auto-Interp
Negative Logits
Houſe
-0.51
herself
-0.47
itſelf
-0.46
economico
-0.46
vieja
-0.45
Alva
-0.44
ąg
-0.44
Pergamon
-0.44
damska
-0.44
jalá
-0.42
POSITIVE LOGITS
there
1.56
they
1.33
it
1.22
there
1.20
we
1.06
he
0.97
THERE
0.96
There
0.90
There
0.87
although
0.85
Activations Density 0.139%