INDEX
Explanations
references to historical figures and events, especially from ancient Rome
New Auto-Interp
Negative Logits
ROY
-0.15
ellen
-0.15
ãĥ¥ãĥ¼
-0.15
견
-0.14
Chim
-0.14
ialias
-0.14
iali
-0.14
CodeGen
-0.13
pek
-0.13
_Grid
-0.13
POSITIVE LOGITS
Pub
0.27
Pub
0.24
Quint
0.22
Sext
0.22
Це
0.21
Brut
0.20
Marcus
0.20
Tit
0.20
Tit
0.20
Serv
0.20
Activations Density 0.009%