INDEX
Explanations
references or mentions of specific individuals or titles
New Auto-Interp
Negative Logits
aarrggbb
-1.17
myſelf
-1.08
varandra
-1.00
+#+#
-0.99
CloseOperation
-0.97
itſelf
-0.97
mijne
-0.96
feroit
-0.95
Efq
-0.95
Monfieur
-0.95
POSITIVE LOGITS
en
0.62
F
0.56
B
0.53
a
0.51
is
0.51
in
0.50
↵↵
0.50
.
0.49
ansky
0.48
Ho
0.48
Activations Density 0.350%