INDEX
Explanations
phrases related to public perception or beliefs and their implications
New Auto-Interp
Head Attr Weights
0:0.01
1:0.01
2:0.10
3:0.35
4:0.10
5:0.03
6:0.03
7:0.07
8:0.04
9:0.05
10:0.09
11:0.05
Negative Logits
JUST
-2.14
��
-2.06
]
-1.85
giene
-1.83
uers
-1.82
untarily
-1.74
phrine
-1.72
Tam
-1.69
exit
-1.67
��
-1.66
POSITIVE LOGITS
nowadays
1.87
Comet
1.64
risome
1.61
famously
1.53
Franch
1.51
sightings
1.49
pamph
1.48
talented
1.47
fantastic
1.47
pretty
1.45
Activations Density 0.040%