INDEX
Explanations
names or designations of individuals or groups
New Auto-Interp
Negative Logits
itſelf
-0.95
faſt
-0.88
myſelf
-0.88
againſt
-0.81
uſe
-0.77
iſt
-0.74
―――――
-0.73
་་
-0.73
Theſe
-0.72
themſelves
-0.71
POSITIVE LOGITS
G
1.12
M
1.04
P
1.03
W
1.03
K
1.01
B
1.01
Z
0.99
F
0.99
L
0.98
S
0.98
Activations Density 0.945%