INDEX
Explanations
specific numeric values or identifiers
New Auto-Interp
Negative Logits
McMahon
-0.17
iges
-0.14
104
-0.14
seiz
-0.14
Bender
-0.14
eren
-0.14
zek
-0.14
ayn
-0.14
Burke
-0.14
SES
-0.14
POSITIVE LOGITS
983
0.28
585
0.25
388
0.25
788
0.24
783
0.23
982
0.23
583
0.23
988
0.23
784
0.23
582
0.22
Activations Density 0.020%