INDEX
Explanations
phrases related to individual responsibility or exclusivity
New Auto-Interp
Negative Logits
horm
-0.72
Caf
-0.69
Everywhere
-0.67
Machines
-0.67
naires
-0.64
Eggs
-0.64
Kids
-0.63
arthy
-0.63
Drugs
-0.63
Laboratories
-0.63
POSITIVE LOGITS
propriet
1.25
occupant
0.96
purpose
0.92
superpower
0.89
mn
0.87
exception
0.87
ieth
0.86
sole
0.83
reliance
0.82
handedly
0.80
Activations Density 0.018%