INDEX
Explanations
phrases indicating probability or likelihood
New Auto-Interp
Negative Logits
idum
-0.70
providedIn
-0.67
Фор
-0.64
ByVersion
-0.63
pstmt
-0.63
andaag
-0.62
everybody
-0.62
Everybody
-0.61
twimg
-0.61
Schwe
-0.61
POSITIVE LOGITS
likely
2.99
likely
2.77
Likely
2.77
Likely
2.52
LIK
1.72
unlikely
1.64
unlikely
1.58
likelihood
1.57
Likelihood
1.41
likelihood
1.36
Activations Density 0.063%