INDEX
Explanations
mentions of countries or geographic regions
references to international relations and geopolitical issues
New Auto-Interp
Negative Logits
oath
-0.88
aven
-0.83
pudding
-0.82
covenant
-0.81
screwed
-0.77
consum
-0.77
prosec
-0.76
enqu
-0.75
neglig
-0.75
abusing
-0.75
POSITIVE LOGITS
Yet
1.66
But
1.64
Experts
1.62
However
1.57
Nevertheless
1.56
Recent
1.55
Researchers
1.54
Such
1.53
Others
1.53
Some
1.52
Activations Density 0.394%