INDEX
Explanations
parentheses and their frequency in the text
New Auto-Interp
Negative Logits
APER
-0.08
830
-0.07
aper
-0.07
tright
-0.07
342
-0.07
marvin
-0.07
eeper
-0.07
avel
-0.07
inski
-0.06
800
-0.06
POSITIVE LOGITS
两个
0.07
ä¸ī个
0.07
Cly
0.07
unken
0.06
ÏĦιÏĥ
0.06
steen
0.06
disposed
0.06
lere
0.06
WISE
0.06
Roland
0.06
Activations Density 0.031%