INDEX
Explanations
references to liberal arts and liberal ideologies
New Auto-Interp
Negative Logits
iru
-0.72
Blazing
-0.71
Saga
-0.70
BLE
-0.64
Redemption
-0.63
senal
-0.62
Danger
-0.60
Dragons
-0.58
Gi
-0.58
gpu
-0.57
POSITIVE LOGITS
arts
0.98
izing
0.85
ization
0.83
esse
0.82
izations
0.80
nesota
0.76
Democrat
0.76
itarian
0.74
neapolis
0.73
izes
0.73
Activations Density 0.070%