INDEX
Explanations
names of individuals
segments of text that are empty or contain no significant content
New Auto-Interp
Negative Logits
mares
-0.65
pill
-0.60
secut
-0.59
ologies
-0.57
bush
-0.56
oking
-0.56
esthes
-0.55
abre
-0.55
ively
-0.54
orship
-0.54
POSITIVE LOGITS
vernment
1.12
iants
0.90
roups
0.89
busters
0.83
raphic
0.81
irlfriend
0.80
omez
0.78
reens
0.77
stones
0.76
glers
0.75
Activations Density 0.246%