INDEX
Explanations
sequences of underscores and non-standard formatting markers
New Auto-Interp
Negative Logits
3
-0.58
.
-0.58
1
-0.57
/
-0.56
The
-0.55
4
-0.55
2
-0.54
-0.54
5
-0.54
0
-0.51
POSITIVE LOGITS
ьаж
0.95
protoimpl
0.95
myſelf
0.91
UserScript
0.89
Савезне
0.88
pleaſure
0.87
談社
0.86
featureID
0.86
itſelf
0.84
Lightboxes
0.84
Activations Density 0.272%