INDEX
Explanations
the acronym "NP" followed by a single-digit number
references to named entities or categories in a document
New Auto-Interp
Negative Logits
hips
-0.82
ingham
-0.73
zona
-0.73
borgh
-0.71
iances
-0.69
mir
-0.69
âĸ¬
-0.68
bell
-0.68
hower
-0.67
tered
-0.67
POSITIVE LOGITS
NP
1.03
NP
0.91
oint
0.77
SS
0.77
ointed
0.73
PP
0.72
emonic
0.72
ublic
0.71
EED
0.71
hemer
0.71
Activations Density 0.006%