INDEX
Explanations
phrases related to returning or reflecting on past events
New Auto-Interp
Negative Logits
orthodox
-0.77
imon
-0.71
icons
-0.67
shown
-0.67
ruff
-0.67
olor
-0.66
stant
-0.65
orst
-0.65
ording
-0.65
tops
-0.64
POSITIVE LOGITS
undone
1.11
forth
0.96
ashore
0.92
hither
0.88
into
0.81
apart
0.81
roaring
0.81
closer
0.80
leon
0.78
out
0.77
Activations Density 1.061%