INDEX
Explanations
pronouns representing people or characters
references to personal experiences or actions involving individuals
New Auto-Interp
Negative Logits
Alright
-0.71
tera
-0.64
alter
-0.64
puff
-0.64
Cancel
-0.63
cox
-0.63
Alright
-0.62
legal
-0.62
íķ
-0.62
âķIJ
-0.62
POSITIVE LOGITS
encountered
1.51
noticed
1.49
encount
1.49
encounter
1.43
hear
1.37
saw
1.37
encounters
1.35
spotted
1.32
find
1.32
encountering
1.31
Activations Density 0.435%