INDEX
Explanations
flags, writing, peace, chains, airplanes
New Auto-Interp
Negative Logits
Style
0.43
Pink
0.43
Textile
0.39
TCE
0.39
Class
0.38
wil
0.38
Tier
0.38
yle
0.38
pink
0.37
all
0.37
POSITIVE LOGITS
1.13
️
1.03
︎
0.84
0.78
♂️
0.76
0.70
♀️
0.70
♂
0.69
0.69
♀
0.63
Activations Density 0.007%