INDEX
Explanations
special characters indicating a specific type of formatting or encoding in text
occurrences of the character "Ŀ"
New Auto-Interp
Negative Logits
shape
-0.79
warr
-0.79
ende
-0.77
undecided
-0.76
likeness
-0.75
imagination
-0.73
unconscious
-0.73
resemblance
-0.72
unborn
-0.71
barg
-0.70
POSITIVE LOGITS
ï¸ı
1.09
ï¸
1.00
Additionally
0.91
However
0.90
Similarly
0.88
Additionally
0.87
ttp
0.87
cue
0.86
Unless
0.84
¯
0.84
Activations Density 0.184%