INDEX
Explanations
conversational phrases or expressions involving personal pronouns and contractions
New Auto-Interp
Negative Logits
def
-0.57
main
-0.56
sim
-0.56
base
-0.56
file
-0.55
ver
-0.55
gra
-0.54
franchise
-0.54
bes
-0.54
del
-0.53
POSITIVE LOGITS
berdua
0.46
ślub
0.40
Económica
0.39
Erfindung
0.39
ających
0.38
AndEndTag
0.38
Absicht
0.37
wystarczy
0.37
chcą
0.37
triliun
0.37
Activations Density 0.091%