INDEX
Explanations
proper nouns, specifically names of individuals or organizations
New Auto-Interp
Negative Logits
tas
-0.81
tó
-0.63
e
-0.63
ا
-0.62
the
-0.61
able
-0.59
a
-0.58
travel
-0.58
té
-0.57
d
-0.57
POSITIVE LOGITS
nnnn
0.70
nnn
0.66
̩
0.59
ostavi
0.58
ThroughAttribute
0.57
intptr
0.57
tann
0.57
}{||0.56
houſe
0.56
__*/
0.55
Activations Density 0.278%