INDEX
Explanations
proper names or people's names
references to individuals named Lauren
New Auto-Interp
Negative Logits
*/(
-0.79
lication
-0.74
LEASE
-0.66
Versions
-0.65
scorp
-0.64
neurot
-0.64
tempered
-0.62
Downloadha
-0.61
udeb
-0.61
icted
-0.61
POSITIVE LOGITS
uren
0.96
Bac
0.87
Lauren
0.84
ists
0.82
ima
0.80
Faust
0.79
Greene
0.75
uthor
0.75
igue
0.75
Vander
0.75
Activations Density 0.013%