INDEX
Explanations
phrases related to relationships and strong emotional connections between people
references to community and shared experiences
New Auto-Interp
Negative Logits
report
-0.74
levard
-0.72
urther
-0.70
afort
-0.67
ãĤ®
-0.67
reported
-0.66
reply
-0.66
retard
-0.66
endix
-0.65
llor
-0.65
POSITIVE LOGITS
underdog
0.81
timeless
0.81
innate
0.80
storytelling
0.79
uniqueness
0.79
masculinity
0.78
lineage
0.76
kins
0.76
simplicity
0.74
masculine
0.74
Activations Density 0.853%