INDEX
Explanations
references to personal experiences and self-reflection
New Auto-Interp
Negative Logits
Provided
-0.16
suggested
-0.16
supplied
-0.16
offered
-0.15
provided
-0.15
left
-0.15
aginator
-0.14
lettes
-0.14
advised
-0.14
Allowed
-0.14
POSITIVE LOGITS
saw
0.42
Saw
0.38
heard
0.36
heard
0.31
seen
0.30
seen
0.29
spotted
0.28
Heard
0.28
Seen
0.27
Seen
0.27
Activations Density 0.276%