INDEX
Explanations
references to Greek culture and terminology
New Auto-Interp
Negative Logits
readcr
-0.16
cce
-0.16
icity
-0.16
ÑģÑĮ
-0.15
ization
-0.15
c
-0.15
cle
-0.15
oooooooo
-0.15
truth
-0.15
riers
-0.15
POSITIVE LOGITS
iness
0.23
stakes
0.22
zeitig
0.21
fulness
0.20
ening
0.20
ened
0.19
orative
0.19
fully
0.18
-quarters
0.18
quarters
0.17
Activations Density 0.078%