INDEX
Explanations
sentences describing observation, evaluation, or realization
phrases indicating perception or observation
New Auto-Interp
Negative Logits
ufact
-0.96
angan
-0.74
depended
-0.69
lette
-0.64
Huntington
-0.63
aga
-0.62
ulo
-0.62
STE
-0.59
}{-0.59
atton
-0.59
POSITIVE LOGITS
VIDEOS
0.91
firsthand
0.85
unfold
0.83
ideos
0.75
videos
0.72
markets
0.71
therapist
0.70
positives
0.66
resemblance
0.65
deen
0.64
Activations Density 0.248%