INDEX
Explanations
instances of the word "probably" followed by an observation or statement
New Auto-Interp
Negative Logits
elight
-0.86
issy
-0.84
iya
-0.84
iates
-0.83
ife
-0.81
aily
-0.80
uctor
-0.80
iate
-0.79
church
-0.76
gian
-0.76
POSITIVE LOGITS
misunder
0.91
quir
0.83
underestimate
0.82
Ń·
0.82
overest
0.76
exagger
0.74
owe
0.74
ali
0.72
TOR
0.72
©¶æ
0.72
Activations Density 8.208%