INDEX
Explanations
phrases related to overuse or excessive behavior
New Auto-Interp
Negative Logits
reve
-0.18
gle
-0.17
aub
-0.16
leton
-0.16
gebn
-0.15
öl
-0.15
_dispatch
-0.14
tri
-0.14
ropol
-0.14
Rosenberg
-0.14
POSITIVE LOGITS
counter
0.31
-counter
0.30
Counter
0.28
counter
0.28
Counter
0.26
ounter
0.22
_counter
0.21
(counter
0.21
OUNTER
0.20
ternet
0.20
Activations Density 0.009%