INDEX
Explanations
references to status, reputation, and legitimacy in various contexts
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.08
3:0.13
4:0.02
5:0.04
6:0.06
7:0.07
8:0.09
9:0.21
10:0.05
11:0.17
Negative Logits
Shutterstock
-1.29
avid
-1.23
alone
-1.16
uploads
-1.14
videos
-1.13
blindly
-1.12
bombard
-1.07
tabl
-1.07
udi
-1.05
scams
-1.04
POSITIVE LOGITS
victory
1.39
pletion
1.36
omen
1.35
distinction
1.32
iction
1.29
inction
1.26
clin
1.25
ointment
1.23
favorite
1.22
winning
1.21
Activations Density 0.009%