INDEX
Explanations
phrases indicating completion or accomplishment
pronouns and verbs that indicate actions or capabilities related to doing something
New Auto-Interp
Negative Logits
Cabin
-0.68
Craw
-0.63
Wide
-0.61
Ruk
-0.59
XT
-0.59
Bunker
-0.59
Upper
-0.58
ixels
-0.58
Toggle
-0.58
cv
-0.57
POSITIVE LOGITS
own
0.92
lawfully
0.87
self
0.78
selves
0.77
pretended
0.74
areth
0.72
predec
0.72
OWN
0.71
intended
0.68
normally
0.68
Activations Density 0.202%