INDEX
Explanations
references to quantities or articles in a descriptive context
New Auto-Interp
Negative Logits
MM
-0.15
cf
-0.14
ipel
-0.14
guard
-0.14
MAN
-0.14
agascar
-0.13
icion
-0.13
umat
-0.13
MB
-0.13
ubbo
-0.13
POSITIVE LOGITS
iversal
0.18
maal
0.18
526
0.17
pha
0.16
altro
0.15
ltra
0.14
autre
0.14
огда
0.14
krom
0.14
ERGY
0.14
Activations Density 0.033%