INDEX
Explanations
positive evaluations or affirmations
instances of the word "good" and its context related to positive outcomes or sentiments
New Auto-Interp
Negative Logits
eters
-0.71
gemony
-0.71
Strait
-0.71
oths
-0.71
fty
-0.71
Pavilion
-0.69
uthor
-0.69
opers
-0.66
edIn
-0.66
ategory
-0.66
POSITIVE LOGITS
enough
1.34
bye
1.22
luck
1.16
reads
1.15
enough
1.11
luck
1.10
ol
1.07
Samar
1.06
night
0.98
Enough
0.94
Activations Density 0.072%