INDEX
Explanations
phrases related to legal or political contexts
instances of a specific character or symbol in the text
New Auto-Interp
Negative Logits
shape
-0.87
undecided
-0.77
likeness
-0.77
imagination
-0.76
unborn
-0.76
untouched
-0.75
unconscious
-0.72
unwanted
-0.72
independ
-0.71
ponder
-0.70
POSITIVE LOGITS
ï¸ı
1.00
ï¸
0.91
Similarly
0.86
ttp
0.86
âϦ
0.84
However
0.81
Likewise
0.81
İ
0.81
tab
0.81
Previous
0.81
Activations Density 0.210%