INDEX
Explanations
HTML tags and their attributes
New Auto-Interp
Negative Logits
hoe
-0.15
BÄĽ
-0.15
toler
-0.15
zman
-0.15
heimer
-0.15
·
-0.14
usercontent
-0.14
abh
-0.14
elle
-0.14
tol
-0.13
POSITIVE LOGITS
br
0.33
br
0.31
hr
0.27
BR
0.27
p
0.26
strong
0.25
BR
0.23
strong
0.23
p
0.23
HR
0.22
Activations Density 0.074%