INDEX
Explanations
instances of surprise or astonishment in the text
New Auto-Interp
Negative Logits
Vig
-0.17
ationship
-0.17
olec
-0.15
å¿į
-0.14
Dra
-0.14
KT
-0.14
ctor
-0.13
odial
-0.13
ifice
-0.13
holes
-0.13
POSITIVE LOGITS
_COMPAT
0.16
asma
0.16
lys
0.15
143
0.14
ActionTypes
0.14
hq
0.14
Miller
0.14
머
0.14
unk
0.13
Schmidt
0.13
Activations Density 0.007%