INDEX
Explanations
text surrounded by a lot of underscores
placeholder text or redacted information
New Auto-Interp
Negative Logits
divers
-0.90
oms
-0.70
ktop
-0.67
irection
-0.66
ifled
-0.64
subord
-0.63
Commission
-0.63
ena
-0.63
division
-0.63
roads
-0.63
POSITIVE LOGITS
______
1.49
_______
1.39
_____
1.34
________________
1.28
___
1.28
________________________
1.25
________
1.24
____
1.24
________________________________________________________________
1.16
________________________________
1.05
Activations Density 0.013%