INDEX
Explanations
temporal references or time-related phrases
New Auto-Interp
Head Attr Weights
0:0.02
1:0.03
2:0.11
3:0.15
4:0.34
5:0.02
6:0.04
7:0.06
8:0.03
9:0.03
10:0.07
11:0.05
Negative Logits
aeda
-2.00
cius
-1.77
ads
-1.74
<?
-1.72
tics
-1.53
itute
-1.47
Enemies
-1.45
,,,,,,,,
-1.45
@@
-1.45
imitate
-1.42
POSITIVE LOGITS
vacated
1.74
intervened
1.62
soon
1.59
Jub
1.57
flooded
1.56
Tuc
1.52
exited
1.51
Untitled
1.50
Marqu
1.48
later
1.46
Activations Density 0.073%