INDEX
Explanations
instances where individuals believe they can achieve certain outcomes or accomplish specific tasks
words indicating the ability or possibility of an action
New Auto-Interp
Negative Logits
furt
-0.70
Federation
-0.68
revision
-0.66
Likes
-0.63
Rings
-0.63
rejection
-0.61
Irving
-0.61
Strikes
-0.60
Yards
-0.59
rehearsal
-0.58
POSITIVE LOGITS
't
1.60
berra
1.18
adian
1.11
NOT
1.08
afford
1.07
ieve
0.89
easily
0.87
isters
0.87
vas
0.86
regulate
0.83
Activations Density 0.179%