INDEX
Explanations
phrases indicating hierarchical or associative relationships
New Auto-Interp
Negative Logits
faſt
-0.91
juſ
-0.90
ſta
-0.88
ſche
-0.86
pleaſure
-0.85
ſever
-0.80
ſtate
-0.77
purpoſe
-0.77
anſ
-0.76
ſtand
-0.75
POSITIVE LOGITS
of
2.14
Of
1.25
OF
1.15
Of
1.09
of
1.09
của
1.08
ของ
1.02
של
0.87
オブ
0.85
ऑफ
0.82
Activations Density 1.582%