INDEX
Explanations
phrases related to family and social relationships
New Auto-Interp
Negative Logits
bridges
-0.75
Gate
-0.67
issu
-0.65
worthiness
-0.63
uncture
-0.63
resy
-0.63
repetition
-0.62
surpassed
-0.61
vell
-0.61
ivably
-0.60
POSITIVE LOGITS
mates
0.85
mates
0.82
downstairs
0.77
belongings
0.76
forts
0.70
cousins
0.70
selves
0.70
sleeping
0.70
goodbye
0.69
emate
0.66
Activations Density 0.248%