INDEX
Explanations
numerical data related to experimental results
New Auto-Interp
Negative Logits
зулта
-0.61
ukone
-0.55
باخ
-0.55
ve
-0.55
estad
-0.54
InjectAttribute
-0.54
^^^^^^^^
-0.53
Galería
-0.53
roughs
-0.53
:]:
-0.52
POSITIVE LOGITS
0
1.00
########.
0.83
3
0.83
1
0.82
5
0.82
2
0.82
4
0.78
6
0.78
7
0.74
8
0.73
Activations Density 0.399%