INDEX
Explanations
sentences related to personal beliefs or values
frequent references to "life" and its various conditions and contexts
New Auto-Interp
Negative Logits
wi
-0.65
Hayden
-0.63
Preston
-0.62
yip
-0.58
arat
-0.58
azines
-0.58
repeatedly
-0.57
hops
-0.56
shops
-0.55
fielded
-0.54
POSITIVE LOGITS
edly
0.93
enment
0.81
same
0.77
ATION
0.75
itarian
0.75
edIn
0.74
brunt
0.73
ATIONS
0.73
iously
0.73
ité
0.73
Activations Density 0.375%