INDEX
Explanations
action or choice followed by outcome
New Auto-Interp
Negative Logits
frustrated
0.48
fundamentally
0.45
SHOP
0.45
shelf
0.44
babe
0.44
revolve
0.44
revolves
0.44
handed
0.43
spun
0.43
conference
0.43
POSITIVE LOGITS
२
0.52
Información
0.47
Descripción
0.47
१
0.46
૨
0.46
amię
0.45
",",
0.45
۲
0.45
qualche
0.45
ඩ්
0.44
Activations Density 0.004%