INDEX
Explanations
phrases that are emphasized with quotation marks
New Auto-Interp
Negative Logits
Ross
-0.78
XL
-0.77
XI
-0.77
Isaac
-0.76
Mehran
-0.76
rall
-0.75
reel
-0.75
chant
-0.74
Creator
-0.73
Amar
-0.73
POSITIVE LOGITS
significant
1.78
multiple
1.68
serious
1.66
many
1.65
very
1.62
extremely
1.61
sever
1.60
reasonable
1.60
certain
1.60
extra
1.58
Activations Density 0.134%