INDEX
Explanations
words related to the beginning or onset of events or processes
words indicating the initiation or commencement of actions or events
New Auto-Interp
Negative Logits
ighth
-0.77
atform
-0.65
=-=-=-=-=-=-=-=-
-0.64
atu
-0.63
Ago
-0.62
swer
-0.61
entirety
-0.61
ocally
-0.58
=-=-=-=-
-0.57
berus
-0.55
POSITIVE LOGITS
behaving
1.04
noticing
0.96
to
0.96
disappearing
0.94
anew
0.93
creeping
0.88
appearing
0.88
leaking
0.88
piling
0.87
popping
0.86
Activations Density 0.061%