INDEX
Explanations
mentions of performance-related contexts in various settings
New Auto-Interp
Negative Logits
-0.68
de
-0.63
vor
-0.57
y
-0.56
der
-0.56
I
-0.55
il
-0.54
D
-0.54
C
-0.54
solid
-0.53
POSITIVE LOGITS
Diſ
1.47
Anſ
1.34
ſtate
1.29
Reſ
1.28
Eſ
1.27
myſelf
1.26
Jefus
1.20
itſelf
1.20
ſeveral
1.19
Perſ
1.17
Activations Density 0.171%