INDEX
Explanations
the presence of asterisks and associated formatting in the text
New Auto-Interp
Negative Logits
Colette
-0.72
lito
-0.72
Blak
-0.71
MTA
-0.71
albert
-0.70
Sally
-0.70
Nava
-0.70
mina
-0.70
Peres
-0.69
Raton
-0.69
POSITIVE LOGITS
¡¡
1.09
)**
1.01
(**
0.99
wikipagina
0.99
/****
0.94
]**
0.92
●●
0.90
kwargs
0.89
¡¡
0.87
.**
0.87
Activations Density 0.297%