INDEX
Explanations
mentions of editors and editing roles or positions
New Auto-Interp
Negative Logits
*/].
-0.62
siker
-0.56
Gad
-0.51
무
-0.51
typeof
-0.50
WALL
-0.49
szabad
-0.49
ور
-0.49
Gla
-0.48
Bul
-0.48
POSITIVE LOGITS
editor
1.20
Editor
1.07
EDITOR
1.06
editors
1.02
editor
0.99
trustees
0.84
Editor
0.83
Trustees
0.82
0.81
Editors
0.81
Activations Density 0.121%