INDEX
Explanations
phrases related to personal relationships and connections
references to relationships and social connections
New Auto-Interp
Negative Logits
meaningless
-0.80
reversible
-0.80
versible
-0.75
pointless
-0.71
apocalypse
-0.70
stagn
-0.67
virgin
-0.66
stupid
-0.64
tomorrow
-0.64
mayhem
-0.63
POSITIVE LOGITS
intimately
0.89
fond
0.86
mentors
0.83
Favorite
0.80
befriend
0.80
mentor
0.79
longtime
0.78
sidx
0.78
knows
0.77
classmate
0.76
Activations Density 0.772%