INDEX
Explanations
phrases indicating a particular thought, feeling, or perspective
expressions related to perceptions or feelings about situations
New Auto-Interp
Negative Logits
oute
-0.77
ynski
-0.77
uster
-0.74
usters
-0.72
ividual
-0.71
sugg
-0.71
erville
-0.70
etheus
-0.69
ewitness
-0.66
iners
-0.64
POSITIVE LOGITS
fare
0.85
ward
0.71
forward
0.69
forever
0.68
ï¸
0.66
lier
0.65
bill
0.64
fitting
0.64
footed
0.64
WARD
0.64
Activations Density 0.038%