INDEX
Explanations
phrases indicating exclusivity or significance
New Auto-Interp
Head Attr Weights
0:0.08
1:0.07
2:0.09
3:0.09
4:0.08
5:0.09
6:0.07
7:0.08
8:0.08
9:0.08
10:0.08
11:0.07
Negative Logits
++++++++
-2.90
essays
-2.84
essay
-2.68
Convers
-2.64
ø
-2.64
California
-2.62
filmmaker
-2.61
filmmakers
-2.61
poets
-2.59
physicists
-2.58
POSITIVE LOGITS
Elite
2.99
yip
2.81
Cena
2.80
Beet
2.74
Jade
2.73
ndum
2.71
Celt
2.65
NX
2.63
oubted
2.63
Jaguar
2.63
Activations Density 0.000%