INDEX
Explanations
proper nouns
unique identifiers or names of people and entities
New Auto-Interp
Negative Logits
00007
-0.63
NK
-0.62
indexes
-0.61
motions
-0.60
IPM
-0.59
rooms
-0.59
directions
-0.58
paw
-0.58
hett
-0.57
precincts
-0.57
POSITIVE LOGITS
ccording
0.98
pillar
0.87
zona
0.81
ENA
0.80
uthor
0.78
ritz
0.75
©¶æ¥µ
0.75
vantage
0.74
Lago
0.72
nikov
0.72
Activations Density 0.171%