INDEX
Explanations
phrases related to reasons or motivations
references to motivation and reasons for actions
New Auto-Interp
Negative Logits
roth
-0.71
gravity
-0.67
ertain
-0.62
GA
-0.61
eer
-0.59
Present
-0.57
TRANS
-0.57
dden
-0.56
=-=-=-=-=-=-=-=-
-0.56
UGE
-0.56
POSITIVE LOGITS
upstream
0.70
retty
0.69
sometime
0.68
facult
0.66
awhile
0.65
midday
0.65
obe
0.65
©¶æ
0.65
spor
0.64
ename
0.64
Activations Density 0.600%