INDEX
Explanations
the word "typical" and related adverbs
New Auto-Interp
Negative Logits
Monfieur
-1.66
Jefus
-1.65
Diſ
-1.63
Efq
-1.62
houſe
-1.56
Anſ
-1.55
Reſ
-1.54
Majefty
-1.54
Houſe
-1.52
ſeveral
-1.48
POSITIVE LOGITS
w
0.73
k
0.73
↵
0.70
//
0.69
v
0.66
mu
0.65
us
0.64
j
0.64
end
0.64
//
0.63
Activations Density 1.729%