INDEX
Explanations
evaluations and classifications of people or entities
New Auto-Interp
Head Attr Weights
0:0.13
1:0.13
2:0.03
3:0.05
4:0.05
5:0.22
6:0.03
7:0.03
8:0.11
9:0.05
10:0.06
11:0.06
Negative Logits
olin
-1.51
defense
-1.46
Patch
-1.39
Guard
-1.39
[&
-1.37
playbook
-1.35
Berger
-1.34
ernels
-1.33
stack
-1.32
payoff
-1.32
POSITIVE LOGITS
='
1.68
ERY
1.68
ALSE
1.59
('1.59
raph
1.49
below
1.45
ا
1.45
abbrevi
1.40
uphem
1.37
asel
1.33
Activations Density 0.016%