INDEX
Explanations
numerical data related to performance metrics and statistical measures
New Auto-Interp
Negative Logits
prob
-0.16
geil
-0.15
звиÑĩай
-0.14
äter
-0.14
+↵↵
-0.14
Ñĸк
-0.14
ака
-0.14
aleb
-0.14
ods
-0.13
Gron
-0.13
POSITIVE LOGITS
negative
0.50
(-
0.46
(-
0.44
minus
0.43
Negative
0.42
negative
0.39
`-
0.39
[-
0.38
Negative
0.38
=-
0.38
Activations Density 0.145%