INDEX
Explanations
move towards or strategic action
New Auto-Interp
Negative Logits
ad
0.51
Moving
0.50
j
0.46
task
0.44
moving
0.43
Moving
0.43
task
0.43
Task
0.42
moving
0.41
ok
0.41
POSITIVE LOGITS
made
0.54
applauded
0.49
regretted
0.48
荥
0.45
gemaakt
0.45
سوی
0.44
estrategia
0.44
bylaws
0.43
ৈতিক
0.43
маъ
0.42
Activations Density 0.009%