INDEX
Explanations
actions or intentions related to knowing or understanding
statements expressing certainty or confidence in one's knowledge or abilities
New Auto-Interp
Negative Logits
vig
-0.65
hement
-0.63
anytime
-0.62
å§«
-0.61
Provided
-0.61
mony
-0.60
Awareness
-0.58
somew
-0.58
Lives
-0.57
Transparency
-0.56
POSITIVE LOGITS
sbm
0.75
talking
0.73
/$
0.70
rowing
0.65
READ
0.65
doing
0.65
barg
0.63
PLA
0.63
supposed
0.63
talking
0.63
Activations Density 0.120%