INDEX
Explanations
political figures and their positions
New Auto-Interp
Negative Logits
Counter
-0.71
Chop
-0.69
Bend
-0.69
CT
-0.67
Jem
-0.67
SCP
-0.67
PB
-0.66
Ember
-0.66
Impact
-0.65
outweigh
-0.65
POSITIVE LOGITS
28
1.49
26
1.49
23
1.46
48
1.46
29
1.45
24
1.44
19
1.44
39
1.43
54
1.43
27
1.43
Activations Density 0.571%