INDEX
Explanations
references to people and their beliefs or conditions
New Auto-Interp
Negative Logits
Did
-0.62
did
-0.60
Did
-0.60
AttributeSet
-0.56
did
-0.55
Does
-0.52
Didn
-0.51
Has
-0.48
Doesn
-0.48
Does
-0.48
POSITIVE LOGITS
is
1.37
ValueStyle
0.96
are
0.94
είναι
0.92
là
0.86
expandindo
0.85
帖最后由
0.82
فريبيس
0.82
GEBURTSDATUM
0.82
jesteś
0.78
Activations Density 0.352%