INDEX
Explanations
repetitive phrases or structures involving the word "of"
New Auto-Interp
Negative Logits
faſt
-0.85
ſta
-0.84
juſ
-0.79
pleaſure
-0.79
ſche
-0.79
purpoſe
-0.75
viſ
-0.74
raiſ
-0.73
ſte
-0.73
ſtate
-0.72
POSITIVE LOGITS
of
1.88
Of
1.25
OF
1.25
Of
1.16
of
1.10
của
1.08
ของ
1.04
オブ
0.94
ऑफ
0.88
של
0.87
Activations Density 1.574%