INDEX
Explanations
terms related to the assessment of performance or evaluation
New Auto-Interp
Head Attr Weights
0:0.03
1:0.03
2:0.08
3:0.31
4:0.03
5:0.02
6:0.13
7:0.08
8:0.06
9:0.06
10:0.08
11:0.04
Negative Logits
�
-1.30
�
-1.23
sugg
-1.23
achus
-1.20
yip
-1.19
ulum
-1.18
govtrack
-1.17
Friends
-1.13
enance
-1.13
States
-1.10
POSITIVE LOGITS
Malfoy
1.18
Misty
1.06
ministic
0.98
Orig
0.98
goblin
0.97
LSD
0.93
Audi
0.93
dit
0.93
tarian
0.92
encyclopedia
0.92
Activations Density 0.009%