INDEX
Explanations
expressions indicating personal feelings and self-reflection
New Auto-Interp
Negative Logits
20439
-0.97
iq
-0.66
icals
-0.64
iaries
-0.64
mons
-0.63
APH
-0.63
ewitness
-0.63
grave
-0.62
ãĥīãĥ©
-0.61
stros
-0.59
POSITIVE LOGITS
underest
0.86
misunder
0.79
underestimated
0.78
sensing
0.78
kinda
0.76
luck
0.74
overest
0.74
somew
0.73
forg
0.72
depends
0.71
Activations Density 0.038%