INDEX
Explanations
phrases related to policies or advocacy
New Auto-Interp
Negative Logits
ĸļ
-1.03
ãĤ¼ãĤ¦ãĤ¹
-0.93
Sins
-0.76
Halls
-0.72
Gorge
-0.69
Twain
-0.68
similarities
-0.64
gorge
-0.64
sinks
-0.64
owship
-0.64
POSITIVE LOGITS
digy
1.52
verbs
1.24
ctor
1.18
actively
1.18
dding
1.16
pelling
1.16
strate
1.15
ccess
1.10
gression
1.06
dig
1.06
Activations Density 0.012%