INDEX
Explanations
references to specific individuals
references to the concept of "someone" in various contexts
New Auto-Interp
Negative Logits
ories
-0.77
DOS
-0.76
osterone
-0.72
heny
-0.72
tnc
-0.67
enegger
-0.65
aughs
-0.64
EngineDebug
-0.63
ortex
-0.63
interest
-0.61
POSITIVE LOGITS
else
2.05
Else
1.47
else
1.34
Else
1.31
who
1.01
whose
0.85
whom
0.83
WithNo
0.83
resembling
0.81
who
0.78
Activations Density 0.060%