INDEX
Explanations
prominent figures or individuals in various contexts
New Auto-Interp
Head Attr Weights
0:0.06
1:0.13
2:0.03
3:0.04
4:0.04
5:0.31
6:0.02
7:0.01
8:0.07
9:0.12
10:0.09
11:0.04
Negative Logits
omed
-1.67
κ
-1.59
ipl
-1.51
iple
-1.49
code
-1.45
ologies
-1.44
abases
-1.42
bi
-1.42
fig
-1.40
serial
-1.40
POSITIVE LOGITS
enegger
1.78
Pledge
1.60
salute
1.57
enthusiastically
1.56
Geh
1.47
ersen
1.46
enjoys
1.43
passionately
1.41
yelled
1.39
wears
1.38
Activations Density 0.039%