INDEX
Explanations
proper nouns that refer to specific locations or entities
place names or geographical locations
New Auto-Interp
Negative Logits
ptoms
-0.81
vous
-0.77
ratulations
-0.73
andals
-0.72
enegger
-0.72
raints
-0.68
sylv
-0.66
berman
-0.65
doms
-0.64
ilies
-0.64
POSITIVE LOGITS
behest
0.91
intersections
0.87
Aren
0.84
expense
0.83
apiece
0.81
bury
0.74
checkpoints
0.71
Point
0.70
checkpoint
0.70
helm
0.68
Activations Density 0.286%