INDEX
Explanations
determiners and articles in the text
means or method
New Auto-Interp
Negative Logits
Còn
-0.57
itſelf
-0.52
mijne
-0.51
sibi
-0.50
kysy
-0.48
zijne
-0.45
nôtre
-0.43
Theſe
-0.43
fédé
-0.41
russes
-0.40
POSITIVE LOGITS
through
1.05
Through
1.02
Through
0.98
through
0.97
THROUGH
0.94
Thru
0.91
thru
0.90
Thru
0.85
via
0.83
thru
0.82
Activations Density 0.018%