INDEX
Explanations
proper nouns, specifically names of individuals
prepositions and words that indicate relationships or actions involving agents
New Auto-Interp
Negative Logits
âĶĢâĶĢ
-0.71
etheless
-0.67
RESULTS
-0.65
UNCH
-0.63
Indo
-0.61
FACE
-0.60
Pathfinder
-0.60
Rebirth
-0.60
underwater
-0.60
external
-0.59
POSITIVE LOGITS
enberg
0.98
ovich
0.97
chin
0.95
zik
0.95
hoff
0.94
reau
0.93
elman
0.92
itsch
0.91
atson
0.91
iler
0.91
Activations Density 0.093%