INDEX
Explanations
instances where actions or events are being described
words related to personal experiences and coping strategies
New Auto-Interp
Negative Logits
tomorrow
-0.94
earlier
-0.86
yesterday
-0.85
tonight
-0.82
later
-0.75
beforehand
-0.74
beer
-0.73
today
-0.71
Pigs
-0.67
Sins
-0.65
POSITIVE LOGITS
steadily
1.05
progressively
0.92
relentlessly
0.87
ModLoader
0.77
tirelessly
0.77
consistently
0.75
continuously
0.73
metic
0.72
Reviewer
0.71
mathemat
0.70
Activations Density 0.455%