INDEX
Explanations
phrases indicating specific cases or examples
New Auto-Interp
Negative Logits
enan
-0.07
yz
-0.07
itters
-0.06
aÄį
-0.06
.serialization
-0.06
inz
-0.06
ragen
-0.06
ewe
-0.06
asi
-0.06
roach
-0.06
POSITIVE LOGITS
case
0.18
cases
0.16
caso
0.14
case
0.14
Case
0.13
cases
0.12
_case
0.12
Cases
0.12
-case
0.12
Case
0.11
Activations Density 0.017%