INDEX
Explanations
phrases or sentences expressing strong emotions or opinions
expressions of strong positive feelings
New Auto-Interp
Negative Logits
antry
-0.75
heid
-0.75
ansas
-0.72
lain
-0.70
ucky
-0.68
throp
-0.68
oire
-0.67
reath
-0.66
utions
-0.66
allas
-0.65
POSITIVE LOGITS
darn
0.80
appreciated
0.79
bothered
0.78
FTWARE
0.78
liked
0.76
bothering
0.74
distressed
0.73
needed
0.72
appreciate
0.72
ignment
0.71
Activations Density 0.043%