INDEX
Explanations
phrases related to rules, decisions, and authoritative actions
New Auto-Interp
Negative Logits
sling
-0.72
Tau
-0.61
Plate
-0.60
Launch
-0.60
booze
-0.60
Moose
-0.60
Scan
-0.59
Cure
-0.59
nutshell
-0.58
Lounge
-0.57
POSITIVE LOGITS
existed
0.99
exists
0.96
exist
0.87
manship
0.84
idious
0.81
SHIP
0.78
hip
0.72
ieties
0.70
roman
0.69
itarian
0.69
Activations Density 0.144%