INDEX
Explanations
phrases indicating professional experience or tenure
New Auto-Interp
Negative Logits
Ages
-0.18
ages
-0.17
194
-0.15
796
-0.15
WWII
-0.14
orough
-0.14
æ¬ł
-0.14
оÑĢоÑĤ
-0.14
zn
-0.14
ActionTypes
-0.14
POSITIVE LOGITS
34
0.21
30
0.20
25
0.19
33
0.19
35
0.19
38
0.18
32
0.18
27
0.18
thirty
0.17
28
0.16
Activations Density 0.060%