INDEX
Explanations
references to personal relationships and interactions
New Auto-Interp
Negative Logits
-0.53
WithTag
-0.53
kof
-0.51
IndentedString
-0.50
ordering
-0.47
appraisal
-0.47
bulunabilir
-0.47
doty
-0.46
intercession
-0.46
pomo
-0.45
POSITIVE LOGITS
never
1.06
can
0.97
always
0.97
have
0.97
would
0.96
had
0.95
didn
0.95
still
0.93
don
0.90
will
0.87
Activations Density 1.120%