INDEX
Explanations
phrases related to psychological concepts or introspection
New Auto-Interp
Negative Logits
Licensed
-0.68
Vide
-0.64
Recomm
-0.63
DIR
-0.63
NRS
-0.62
Mehran
-0.61
reek
-0.61
ModLoader
-0.61
Reviewed
-0.60
Slay
-0.60
POSITIVE LOGITS
fulness
1.27
storms
1.12
bender
1.06
share
1.03
scape
0.98
lessly
0.98
ets
0.96
ful
0.95
fuck
0.92
iac
0.92
Activations Density 0.028%