INDEX
Explanations
references to the name "Richard."
New Auto-Interp
Negative Logits
iron
-0.17
gaard
-0.17
133
-0.15
arend
-0.15
agher
-0.15
ightly
-0.15
uni
-0.14
bjerg
-0.14
reen
-0.14
奴
-0.14
POSITIVE LOGITS
sons
0.29
son
0.29
sonian
0.25
Nixon
0.25
Fey
0.24
gere
0.23
SON
0.21
Ñģон
0.21
III
0.20
Daw
0.19
Activations Density 0.011%