INDEX
Explanations
instances of the word "public" and its context, especially in relation to concerns or findings
New Auto-Interp
Head Attr Weights
0:0.02
1:0.02
2:0.06
3:0.07
4:0.18
5:0.03
6:0.04
7:0.32
8:0.03
9:0.04
10:0.07
11:0.06
Negative Logits
lav
-1.78
veyard
-1.59
oother
-1.58
vantage
-1.54
pei
-1.49
microsoft
-1.48
luent
-1.48
ocratic
-1.47
isites
-1.46
tie
-1.45
POSITIVE LOGITS
wrongdoing
1.68
abnorm
1.48
inacc
1.48
arthed
1.47
エル
1.42
commits
1.40
boldly
1.40
terday
1.39
details
1.39
accusations
1.38
Activations Density 0.001%