INDEX
Explanations
statements regarding research findings and their implications
New Auto-Interp
Negative Logits
pleaſure
-0.88
ſche
-0.84
ſtate
-0.75
houſe
-0.73
ſever
-0.73
Reſ
-0.73
faſt
-0.70
greateſt
-0.69
Majefty
-0.69
laſt
-0.68
POSITIVE LOGITS
use
0.69
creation
0.63
manufacture
0.63
AsUp
0.55
creation
0.54
provision
0.50
spread
0.49
development
0.49
hånd
0.48
formation
0.48
Activations Density 0.507%