INDEX
Explanations
negative evaluations or comparisons, particularly in the context of performance or quality
New Auto-Interp
Head Attr Weights
0:0.01
1:0.02
2:0.08
3:0.32
4:0.01
5:0.01
6:0.14
7:0.10
8:0.04
9:0.07
10:0.05
11:0.08
Negative Logits
unal
-1.34
aepernick
-1.26
agg
-1.13
ryu
-1.10
same
-1.08
mand
-1.06
nexus
-1.06
truce
-1.02
Love
-1.02
elcome
-1.01
POSITIVE LOGITS
charts
1.22
iceberg
1.12
ital
1.11
tein
1.11
Rise
1.05
react
1.05
Rai
1.05
Carbuncle
1.04
earners
1.04
Rend
1.04
Activations Density 0.039%