INDEX
Explanations
the presence of specific confirmation or acknowledgment phrases in various contexts
prepositions of origin
New Auto-Interp
Negative Logits
neceſſ
-0.52
deleteUser
-0.52
getSize
-0.51
ſelf
-0.48
houſe
-0.46
uxxxx
-0.45
pleaſure
-0.45
myſelf
-0.44
Figure
-0.44
nettsted
-0.44
POSITIVE LOGITS
от
1.60
від
1.32
От
1.04
От
1.02
from
0.94
от
0.91
Від
0.88
Від
0.82
từ
0.82
od
0.82
Activations Density 0.001%