INDEX
Explanations
specific names, likely related to prominent figures or contributors in a context such as film or sports
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.09
3:0.07
4:0.07
5:0.10
6:0.07
7:0.09
8:0.07
9:0.09
10:0.07
11:0.09
Negative Logits
ichick
-2.90
Engineers
-2.67
wires
-2.55
zik
-2.52
sacks
-2.52
stupidity
-2.49
skulls
-2.47
screwed
-2.45
intel
-2.42
othy
-2.41
POSITIVE LOGITS
Registration
3.35
FN
3.03
Festival
2.95
Hate
2.93
Aff
2.89
Royale
2.75
Regist
2.63
Origin
2.62
Fah
2.62
Gender
2.61
Activations Density 0.000%