INDEX
Explanations
proper nouns related to people or places
occurrences of the name "Hal" in various contexts
New Auto-Interp
Negative Logits
é¾įå¥ij士
-0.82
æĸ¹
-0.75
ãģķ
-0.72
URES
-0.71
hered
-0.69
ãģį
-0.67
CRIP
-0.66
REDACTED
-0.64
çĶŁ
-0.64
Regular
-0.64
POSITIVE LOGITS
ifax
1.24
ftime
1.16
ibur
1.10
ocaust
1.06
ogen
1.06
ving
1.06
iday
1.03
tering
0.96
ved
0.96
ves
0.92
Activations Density 0.036%