INDEX
Explanations
text related to user interactions involving clicking actions
instances of the word "click" indicating user interactions
New Auto-Interp
Negative Logits
Chancellor
-0.71
utive
-0.68
1001
-0.67
von
-0.64
Wein
-0.63
Luxem
-0.63
ASC
-0.62
Abram
-0.62
Zam
-0.60
Wol
-0.60
POSITIVE LOGITS
click
1.18
clicks
0.95
click
0.95
alore
0.91
lish
0.86
hent
0.84
clicked
0.83
Click
0.80
urated
0.78
clicking
0.76
Activations Density 0.011%