INDEX
Explanations
phrases related to political, economic, or physical forces or influences
references to various forces or influences at play in different contexts
New Auto-Interp
Negative Logits
DERR
-0.79
ership
-0.79
MAL
-0.77
Hop
-0.77
STER
-0.76
gdala
-0.75
ocry
-0.75
Hop
-0.70
CHAT
-0.69
TIT
-0.68
POSITIVE LOGITS
force
1.05
exerted
1.01
forces
0.96
maj
0.92
mobilized
0.86
forces
0.81
converge
0.80
recruited
0.80
force
0.79
engaged
0.78
Activations Density 0.024%