INDEX
Explanations
keywords related to politics
mentions of the word "politics" and related variations
New Auto-Interp
Negative Logits
imates
-0.87
uran
-0.74
actory
-0.72
clud
-0.68
amination
-0.68
untary
-0.65
Vera
-0.65
ependence
-0.64
Universal
-0.63
oa
-0.62
POSITIVE LOGITS
correctness
1.08
clinton
0.85
eering
0.85
atism
0.83
hare
0.82
cape
0.79
chool
0.78
manship
0.77
esp
0.73
hip
0.73
Activations Density 0.036%