INDEX
Explanations
names of people and places
prominent figures and their associations in various contexts
New Auto-Interp
Negative Logits
corrid
-0.74
referen
-0.68
thous
-0.65
acebook
-0.61
undermin
-0.60
predec
-0.58
fundament
-0.55
subscript
-0.54
comprom
-0.54
ß
-0.53
POSITIVE LOGITS
hus
0.72
ank
0.65
hov
0.65
han
0.64
angan
0.63
ho
0.62
ius
0.61
esson
0.61
hu
0.61
hom
0.61
Activations Density 0.712%