INDEX
Explanations
phrases indicating various experiences and actions
instances of having accomplished actions or ongoing states
New Auto-Interp
Negative Logits
Apart
-0.73
monog
-0.66
Friendship
-0.63
Outer
-0.59
aments
-0.59
TG
-0.59
Ascension
-0.58
Useful
-0.57
doom
-0.56
hostage
-0.55
POSITIVE LOGITS
been
1.36
been
1.22
become
1.14
begun
1.12
undergone
1.07
gotten
1.03
risen
1.02
Been
0.93
emerged
0.93
gone
0.92
Activations Density 0.279%