INDEX
Explanations
words indicating emphasis or importance
New Auto-Interp
Negative Logits
hou
-0.74
ysis
-0.72
â̦]
-0.72
ipop
-0.72
ously
-0.70
dal
-0.65
iste
-0.65
rette
-0.64
IGH
-0.64
rences
-0.64
POSITIVE LOGITS
_-
0.86
namely
0.70
yes
0.64
->
0.63
albeit
0.61
without
0.60
aka
0.60
Instruments
0.59
especially
0.59
which
0.59
Activations Density 0.048%