INDEX
Explanations
expressions of personal identity and self-description
New Auto-Interp
Negative Logits
lix
-0.16
ÑĢиг
-0.15
egin
-0.15
propose
-0.14
ún
-0.14
Ultimately
-0.14
ultimately
-0.14
ataka
-0.14
ult
-0.13
becomes
-0.13
POSITIVE LOGITS
frequently
0.27
frequ
0.26
regularly
0.25
frequent
0.24
tend
0.21
rarely
0.21
seldom
0.21
freq
0.20
often
0.20
tends
0.20
Activations Density 0.544%