INDEX
Explanations
references to quantities or vague numerical descriptors
New Auto-Interp
Negative Logits
347
-0.16
ritten
-0.15
Gim
-0.14
rompt
-0.14
ola
-0.14
iki
-0.14
vyššÃŃ
-0.14
424
-0.14
opause
-0.13
ilia
-0.13
POSITIVE LOGITS
unately
0.22
instance
0.20
sake
0.20
example
0.19
reasons
0.17
enco
0.17
purposes
0.17
ced
0.17
ged
0.16
InstanceState
0.16
Activations Density 0.073%