INDEX
Explanations
numbers within a specific range
numerical values associated with rankings or measurements
New Auto-Interp
Negative Logits
illiter
-0.65
phony
-0.64
wholes
-0.62
contrace
-0.62
cooperative
-0.62
unex
-0.61
fictitious
-0.61
Pasadena
-0.61
laure
-0.61
bom
-0.61
POSITIVE LOGITS
][
1.88
]
1.47
].
1.38
]).
1.36
]"
1.35
],[
1.31
]-
1.28
],
1.27
](
1.26
])
1.25
Activations Density 0.050%