INDEX
Explanations
HTML and JavaScript elements related to user interface components
HTML tags and structure
New Auto-Interp
Negative Logits
pleaſure
-0.84
myſelf
-0.82
<unused41>
-0.79
<unused79>
-0.79
<unused16>
-0.79
<unused3>
-0.79
<unused14>
-0.78
<unused23>
-0.78
<unused8>
-0.78
<pad>
-0.78
POSITIVE LOGITS
hid
0.33
waiting
0.32
ar
0.31
inf
0.31
i
0.31
0.30
test
0.28
e
0.28
t
0.28
자
0.28
Activations Density 0.048%