INDEX
Explanations
noun phrases indicating outcomes or consequences
phrases that indicate causality or outcomes
New Auto-Interp
Negative Logits
snipp
-0.71
afort
-0.68
redes
-0.63
jug
-0.62
zan
-0.59
suspic
-0.56
lapt
-0.55
flying
-0.55
Fired
-0.54
hello
-0.54
POSITIVE LOGITS
thereof
1.08
of
1.02
result
0.78
result
0.74
OF
0.74
ainer
0.73
Of
0.68
of
0.66
uating
0.65
liest
0.65
Activations Density 0.061%