INDEX
Explanations
references to Roman numerals and similarly formatted citations or enumerations
New Auto-Interp
Negative Logits
ewise
-0.19
zas
-0.16
leased
-0.15
/Dk
-0.15
GMEM
-0.14
egin
-0.14
/inet
-0.14
otte
-0.13
alat
-0.13
assen
-0.13
POSITIVE LOGITS
III
0.44
III
0.41
II
0.36
II
0.35
IV
0.32
iii
0.32
iii
0.31
VIII
0.31
VII
0.31
IV
0.29
Activations Density 0.043%