INDEX
Explanations
mentions of events or actions that started or initiated something new
instances of the word "started" in various contexts
New Auto-Interp
Negative Logits
cit
-0.81
etry
-0.75
acho
-0.73
ighth
-0.73
ethy
-0.73
âĨij
-0.72
alted
-0.71
ugs
-0.69
ses
-0.69
ingly
-0.69
POSITIVE LOGITS
anew
0.93
raining
0.77
PRESS
0.76
OCK
0.75
ŃĶ
0.74
airing
0.73
fuss
0.71
circulating
0.71
behaving
0.71
dating
0.67
Activations Density 0.056%