INDEX
Explanations
phrases that indicate a focus on international relations or geopolitical dynamics
New Auto-Interp
Negative Logits
feld
-0.14
ajan
-0.13
Person
-0.13
lez
-0.12
adium
-0.12
forwarded
-0.12
person
-0.12
-Sah
-0.12
GUI
-0.12
_GUI
-0.12
POSITIVE LOGITS
part
0.28
contrast
0.27
stark
0.26
sharp
0.26
light
0.25
fact
0.24
marked
0.23
line
0.23
particular
0.23
contrad
0.23
Activations Density 0.168%