INDEX
Explanations
phrases and concepts associated with political structures and actions
New Auto-Interp
Negative Logits
similarly
-0.41
similar
-0.35
similar
-0.34
imilar
-0.34
podob
-0.33
comparable
-0.31
Similar
-0.30
Similar
-0.29
simil
-0.28
подоб
-0.27
POSITIVE LOGITS
same
0.49
same
0.49
Same
0.48
Same
0.47
mismo
0.39
SAME
0.38
åIJĮ
0.34
_same
0.33
misma
0.33
mesma
0.32
Activations Density 0.138%