INDEX
Explanations
names of individuals which could be from different contexts or scenarios
proper nouns, specifically names of people
New Auto-Interp
Negative Logits
į
-0.65
LEASE
-0.63
Bent
-0.61
Pixie
-0.61
RJ
-0.60
¯
-0.60
ÑĮ
-0.59
Tup
-0.59
Alto
-0.58
Lex
-0.58
POSITIVE LOGITS
enegger
1.21
testified
0.98
himself
0.84
herself
0.80
told
0.78
remembers
0.77
wrote
0.77
oversaw
0.76
underwent
0.76
admitted
0.75
Activations Density 0.131%