INDEX
Explanations
temporal references related to past events
New Auto-Interp
Negative Logits
_BEFORE
-0.19
before
-0.18
_before
-0.17
ÙĤبÙĦ
-0.16
before
-0.16
they
-0.16
antes
-0.16
Before
-0.16
çĦ¶
-0.15
Before
-0.15
POSITIVE LOGITS
words
0.24
ward
0.21
wards
0.20
hand
0.19
word
0.19
wards
0.18
ниÑħ
0.18
//{{0.17
neath
0.17
them
0.16
Activations Density 0.083%