INDEX
Explanations
roll and subsequent actions
New Auto-Interp
Negative Logits
urnya
0.49
ua
0.48
metik
0.48
蒻
0.48
ноу
0.46
organisms
0.45
larından
0.45
UTRAL
0.44
arial
0.44
matmul
0.44
POSITIVE LOGITS
rolling
1.52
roll
1.45
Roll
1.41
rolled
1.40
Rolling
1.40
Roll
1.35
Rolling
1.32
ROLL
1.24
ROLL
1.21
rollout
1.18
Activations Density 0.020%