INDEX
Explanations
statements about awareness and perception regarding experiences and actions
New Auto-Interp
Negative Logits
prt
-0.15
hcp
-0.14
817
-0.14
iloc
-0.14
misunderstanding
-0.14
avery
-0.13
iten
-0.13
Understanding
-0.13
isman
-0.13
Ratings
-0.13
POSITIVE LOGITS
notice
0.56
noticed
0.50
notices
0.50
notice
0.47
Notice
0.47
Notice
0.46
noticing
0.44
noticed
0.42
NOTICE
0.38
Noticed
0.37
Activations Density 0.151%