INDEX
Explanations
phrases and sentences that provide instructions or prompts
New Auto-Interp
Negative Logits
Palestin
-0.75
PRO
-0.66
Garfield
-0.64
Palest
-0.63
discrep
-0.62
田
-0.61
Amon
-0.59
Abu
-0.59
Lith
-0.58
omething
-0.58
POSITIVE LOGITS
checkout
0.70
う
0.68
eways
0.65
ECTION
0.65
issance
0.64
osta
0.63
illac
0.63
viewed
0.63
icators
0.63
vis
0.61
Activations Density 0.041%