INDEX
Explanations
phrases related to directions or orientations
New Auto-Interp
Negative Logits
esters
-0.74
ammy
-0.68
aqu
-0.66
itted
-0.66
Surviv
-0.63
Byrne
-0.61
sung
-0.61
Mini
-0.61
Jenner
-0.61
odied
-0.61
POSITIVE LOGITS
direction
1.33
directions
1.14
ality
1.07
towards
0.89
ward
0.89
finding
0.88
ational
0.87
toward
0.86
ally
0.82
Directions
0.82
Activations Density 0.024%