INDEX
Explanations
social media handles for various individuals and platforms
references to social media interactions and follow requests
New Auto-Interp
Negative Logits
anan
-0.63
chwitz
-0.61
ring
-0.60
unanim
-0.60
uable
-0.59
massac
-0.59
aughtered
-0.59
ingen
-0.58
cific
-0.58
surplus
-0.57
POSITIVE LOGITS
1.68
1.62
1.48
1.34
1.25
1.21
1.20
1.19
1.15
1.14
Activations Density 0.037%