INDEX
Explanations
historical and political references, particularly related to individuals and events
New Auto-Interp
Negative Logits
FOX
-0.84
Safety
-0.83
blogs
-0.79
tools
-0.78
ourcing
-0.76
Accuracy
-0.76
ratom
-0.76
malink
-0.75
Trend
-0.73
Wisconsin
-0.73
POSITIVE LOGITS
Augustus
1.40
XVI
1.33
VIII
1.25
XIV
1.20
Tud
1.19
Ferdinand
1.15
XII
1.13
ibn
1.13
Herod
1.12
Napoleon
1.12
Activations Density 0.179%