INDEX
Explanations
pre-installed, enforced, fundamental
New Auto-Interp
Negative Logits
2
0.51
3
0.48
6
0.44
raw
0.43
7
0.43
-
0.42
commonplace
0.42
either
0.41
at
0.40
پیر
0.40
POSITIVE LOGITS
الْم
0.41
čky
0.40
Majid
0.40
Dyson
0.40
भूमि
0.38
त्यात
0.38
و
0.38
krijgen
0.38
అంశ
0.38
viktigt
0.38
Activations Density 0.001%