INDEX
Explanations
personal pronouns and phrases indicating understanding or acknowledgment
New Auto-Interp
Negative Logits
arta
-0.15
unami
-0.14
GLOBALS
-0.14
elper
-0.13
ĶåĽŀ
-0.13
ceptar
-0.13
обов
-0.13
innacle
-0.13
uzzi
-0.13
enci
-0.13
POSITIVE LOGITS
idea
0.45
Idea
0.41
idea
0.37
drift
0.28
gist
0.25
IDEA
0.25
ideas
0.24
Ideas
0.23
principle
0.22
drill
0.22
Activations Density 0.055%