INDEX
Explanations
references to specific individuals or entities in a narrative context
New Auto-Interp
Negative Logits
ãĥ£
-0.76
ablishment
-0.73
places
-0.71
ãĥ´
-0.69
charge
-0.65
scratch
-0.64
company
-0.64
ãĥ¥
-0.64
definition
-0.63
ãĥ
-0.60
POSITIVE LOGITS
owship
1.15
sburgh
1.08
inson
0.98
s
0.95
ski
0.90
sburg
0.89
ateral
0.86
sie
0.85
yth
0.83
inelli
0.82
Activations Density 0.003%