INDEX
Explanations
phrases indicating likelihood or speculation
statements that express assumptions or speculations about past events
New Auto-Interp
Negative Logits
opian
-0.67
essee
-0.66
ashi
-0.64
ackle
-0.64
ãĤŃ
-0.62
jri
-0.62
Griff
-0.61
pour
-0.61
KI
-0.60
hetical
-0.58
POSITIVE LOGITS
kidding
0.69
gotten
0.68
sensed
0.65
liked
0.63
©¶æ
0.63
metic
0.60
wondered
0.60
ionics
0.60
appreciated
0.59
REALLY
0.59
Activations Density 0.087%