INDEX
Explanations
phrases introducing characteristics or descriptions
New Auto-Interp
Head Attr Weights
0:0.07
1:0.07
2:0.07
3:0.07
4:0.08
5:0.08
6:0.09
7:0.09
8:0.08
9:0.08
10:0.08
11:0.07
Negative Logits
uyomi
-2.23
��
-2.18
keyes
-1.92
glers
-1.89
jri
-1.89
ursions
-1.84
ashtra
-1.83
vable
-1.81
��
-1.80
sqor
-1.80
POSITIVE LOGITS
Diamond
1.93
Cond
1.76
sexist
1.75
homophobic
1.74
Interstitial
1.70
QC
1.69
stating
1.65
Recommend
1.65
wherein
1.64
casting
1.61
Activations Density 0.000%