INDEX
Explanations
references to the concept of the mind
references to the concept of 'mind' and its variations
New Auto-Interp
Negative Logits
Vide
-0.70
Mehran
-0.69
TAIN
-0.68
Aval
-0.67
Recomm
-0.66
Legend
-0.65
ICAN
-0.63
Reviewed
-0.62
Carly
-0.62
aukee
-0.58
POSITIVE LOGITS
fulness
1.20
lessly
1.14
storms
1.12
share
1.09
bender
1.07
fuck
1.06
scape
1.05
ful
0.99
blow
0.99
fully
0.98
Activations Density 0.034%