INDEX
Explanations
mentions of foundations or organizations
New Auto-Interp
Negative Logits
oline
-0.17
275
-0.17
338
-0.16
sik
-0.15
isions
-0.15
union
-0.15
TINGS
-0.15
uxtap
-0.15
apas
-0.15
272
-0.15
POSITIVE LOGITS
/Foundation
0.24
ally
0.19
ary
0.19
lation
0.18
lay
0.18
aire
0.18
ality
0.17
aries
0.17
ry
0.17
nal
0.16
Activations Density 0.021%