INDEX
Explanations
phrases or words indicating a sequence or order, specifically the word "Second" with a high activation value
phrases or words indicating a sequence or ranking
New Auto-Interp
Negative Logits
ga
-0.79
isconsin
-0.65
Sab
-0.63
pen
-0.63
renheit
-0.60
iesta
-0.58
nas
-0.58
IDA
-0.57
liter
-0.56
ihar
-0.56
POSITIVE LOGITS
secondly
0.75
alternatively
0.72
appropriately
0.72
conclud
0.69
importantly
0.68
wors
0.68
worms
0.64
Artifact
0.62
cynicism
0.62
surpr
0.61
Activations Density 0.115%