INDEX
Explanations
words associated with capable or functional qualities
New Auto-Interp
Negative Logits
ing
-0.98
n
-0.95
es
-0.88
N
-0.81
m
-0.79
↵↵
-0.77
9
-0.74
T
-0.73
est
-0.73
th
-0.73
POSITIVE LOGITS
Efq
1.57
myſelf
1.52
Jefus
1.49
Theſe
1.44
^(@)
1.42
himſelf
1.41
་་
1.40
―――――
1.39
ſeveral
1.37
themſelves
1.34
Activations Density 0.164%