INDEX
Explanations
expressions of opinion or belief about a subject
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.13
3:0.17
4:0.09
5:0.04
6:0.06
7:0.11
8:0.08
9:0.05
10:0.07
11:0.04
Negative Logits
Init
-1.54
intermedi
-1.54
apparatus
-1.51
entry
-1.50
vati
-1.50
induce
-1.48
rendering
-1.45
discour
-1.43
guiActiveUnfocused
-1.41
maintenance
-1.40
POSITIVE LOGITS
:#
2.13
�
2.10
️
1.88
arest
1.81
BUG
1.80
─
1.79
*:
1.78
:-
1.77
english
1.77
%:
1.75
Activations Density 0.000%