INDEX
Explanations
mentions of specific individuals, particularly prominent figures or authorities
New Auto-Interp
Head Attr Weights
0:0.08
1:0.08
2:0.08
3:0.08
4:0.07
5:0.08
6:0.08
7:0.08
8:0.09
9:0.06
10:0.09
11:0.09
Negative Logits
Pastebin
-1.67
folder
-1.65
tweet
-1.59
archive
-1.56
username
-1.54
guid
-1.50
retina
-1.48
tid
-1.47
archived
-1.46
github
-1.44
POSITIVE LOGITS
ccording
1.83
��
1.80
��極
1.76
hered
1.72
20439
1.70
�
1.68
illery
1.62
omorph
1.61
estab
1.60
429
1.58
Activations Density 0.000%