INDEX
Explanations
numerical information or data points in the text
New Auto-Interp
Negative Logits
akh
-0.16
punk
-0.15
iggers
-0.15
aks
-0.15
iaux
-0.14
ools
-0.14
ãĤ
-0.14
Âľ
-0.14
ponsive
-0.14
úi
-0.14
POSITIVE LOGITS
ben
0.14
adiens
0.14
оÑĩ
0.14
eeper
0.14
Ĥ¬
0.14
DDL
0.14
æŁ
0.14
bene
0.13
rint
0.13
PFN
0.13
Activations Density 0.005%