INDEX
Explanations
phrases related to indicating direction or focus
references to guidance or direction
New Auto-Interp
Negative Logits
distraction
-0.62
ornia
-0.58
ment
-0.57
Crush
-0.57
headlines
-0.56
cele
-0.55
ãĥ¡
-0.55
Mansion
-0.55
GHC
-0.53
Aram
-0.53
POSITIVE LOGITS
oward
0.86
heit
0.86
ggle
0.80
geon
0.77
forth
0.76
athe
0.72
ysc
0.71
onge
0.70
arily
0.69
ugh
0.68
Activations Density 0.385%