INDEX
Explanations
proper nouns, especially names of locations and individuals
named entities, specifically focusing on people and places
New Auto-Interp
Negative Logits
usalem
-1.04
asus
-1.04
anes
-0.86
gas
-0.83
agall
-0.79
etsk
-0.78
iffin
-0.77
aya
-0.75
agos
-0.75
berra
-0.74
POSITIVE LOGITS
NEC
0.73
pod
0.72
rehearsal
0.70
lip
0.67
eed
0.67
horm
0.66
*/(
0.63
ULT
0.63
phrine
0.62
Sussex
0.62
Activations Density 0.020%