INDEX
Explanations
phrases related to options, choices, or selection processes
New Auto-Interp
Negative Logits
pre
-0.48
-0.46
1
-0.41
E
-0.41
pre
-0.40
pré
-0.40
()
-0.40
pr
-0.40
pr
-0.39
É
-0.39
POSITIVE LOGITS
ujednoznacz
1.16
myſelf
1.11
purpoſe
1.09
themſelves
1.05
itſelf
1.05
Jefus
1.03
Theſe
1.02
+#+#
1.01
whoſe
0.97
himſelf
0.96
Activations Density 1.009%