INDEX
Explanations
items or concepts that involve comparison or evaluation
New Auto-Interp
Negative Logits
VERTISEMENT
-0.96
ArgsConstructor
-0.92
leſs
-0.80
ChildScrollView
-0.77
ſſen
-0.77
ſſel
-0.76
CHREIB
-0.76
Rptr
-0.76
Lycka
-0.74
Datuak
-0.73
POSITIVE LOGITS
↵↵
1.58
↵
1.42
↵↵↵
1.24
↵↵↵↵
1.17
<eos>
1.09
[toxicity=0]
1.09
↵↵↵↵↵
1.07
↵↵↵↵↵↵
0.98
↵↵↵↵↵↵↵
0.89
↵↵↵↵↵↵↵↵↵
0.88
Activations Density 0.014%