INDEX
Explanations
phrases related to news headlines and book titles
titles of media and references to their content
New Auto-Interp
Negative Logits
periphery
-0.92
linkage
-0.87
suspension
-0.84
prohibition
-0.82
prohibitions
-0.81
optics
-0.81
caution
-0.80
rendering
-0.80
restraint
-0.80
corridor
-0.79
POSITIVE LOGITS
Changed
1.48
Own
1.43
Wrong
1.42
Worse
1.37
Happ
1.36
Alone
1.35
Different
1.34
Been
1.33
Believe
1.33
Killed
1.32
Activations Density 0.236%