INDEX
Explanations
information related to numerical quantities and statistics
numeric values and statistics
New Auto-Interp
Negative Logits
perspect
-0.78
advoc
-0.74
enthusi
-0.71
princ
-0.69
nodd
-0.67
philos
-0.67
duty
-0.66
conduc
-0.66
privileges
-0.66
corrections
-0.65
POSITIVE LOGITS
5
1.94
6
1.80
8
1.79
7
1.78
4
1.77
3
1.76
9
1.70
2
1.66
1
1.58
75
1.58
Activations Density 0.059%