INDEX
Explanations
mentions of groups, communities, or collectives involving people or entities
New Auto-Interp
Negative Logits
-animate
-0.15
cons
-0.15
andest
-0.14
enou
-0.14
ppo
-0.14
/ec
-0.14
iani
-0.14
еко
-0.14
irst
-0.13
pump
-0.13
POSITIVE LOGITS
RITE
0.19
lak
0.17
aux
0.16
tere
0.16
ball
0.15
both
0.15
ussels
0.15
Ludwig
0.15
Both
0.14
ele
0.14
Activations Density 0.389%