INDEX
Explanations
references to actions, objects, or conditions that imply a state of being or processes in various contexts
New Auto-Interp
Negative Logits
ateria
-0.16
witch
-0.15
undy
-0.15
775
-0.15
774
-0.15
544
-0.15
_ATTRIB
-0.15
869
-0.15
agy
-0.14
776
-0.14
POSITIVE LOGITS
venta
0.17
Hu
0.16
pf
0.15
Hu
0.14
.Timeout
0.14
oren
0.14
orp
0.13
cntl
0.13
Ïģκ
0.13
Hull
0.13
Activations Density 0.009%