INDEX
Explanations
references to individual elements in lists or examples
New Auto-Interp
Negative Logits
li
-0.47
Invoke
-0.41
“
-0.41
hiç
-0.40
.
-0.40
c
-0.40
už
-0.40
opers
-0.39
‘
-0.39
מים
-0.39
POSITIVE LOGITS
Majefty
1.15
itſelf
1.09
individually
1.05
Efq
1.04
Monfieur
1.03
myſelf
1.01
poffible
0.98
themſelves
0.97
Jefus
0.95
himſelf
0.94
Activations Density 0.323%