INDEX
Explanations
terms related to politics, legal systems, and governance
references to common concepts or ideas
New Auto-Interp
Negative Logits
enda
-0.82
asus
-0.80
agate
-0.77
gur
-0.74
eki
-0.74
otos
-0.73
endas
-0.73
apo
-0.73
ega
-0.72
usalem
-0.70
POSITIVE LOGITS
wealth
1.41
alities
1.06
ancestor
0.99
denomin
0.98
ensical
0.96
places
0.96
ality
0.91
decency
0.81
place
0.81
occurrence
0.78
Activations Density 0.022%