INDEX
Explanations
phrases related to progress or change in a positive context
the phrase "come" or "came" in various contexts indicating arrival or change
New Auto-Interp
Negative Logits
olor
-0.67
Defense
-0.60
ORED
-0.58
IELD
-0.58
oring
-0.58
endars
-0.57
graph
-0.56
orneys
-0.56
Defense
-0.56
Watkins
-0.54
POSITIVE LOGITS
undone
0.91
together
0.79
hither
0.78
forth
0.77
unst
0.74
iment
0.73
alive
0.72
into
0.71
ubiqu
0.69
Together
0.68
Activations Density 0.068%