INDEX
Explanations
references to critical or impactful societal events and issues
New Auto-Interp
Negative Logits
line
-0.45
ored
-0.24
imuth
-0.21
adecimal
-0.21
LINE
-0.21
htub
-0.20
ground
-0.20
down
-0.20
ewidth
-0.20
ulace
-0.20
POSITIVE LOGITS
wards
0.17
ory
0.17
ORY
0.17
PHA
0.16
Pes
0.15
ursion
0.14
Teeth
0.14
Baron
0.14
396
0.14
atters
0.14
Activations Density 0.151%