INDEX
Explanations
phrases ending with a sentence completion indicator like a period
special characters or formatting elements in the text
New Auto-Interp
Negative Logits
nown
-0.69
disg
-0.68
jri
-0.65
externalToEVAOnly
-0.65
botched
-0.64
disadvant
-0.62
ortium
-0.60
oldown
-0.60
anwhile
-0.60
successors
-0.59
POSITIVE LOGITS
âĢº
1.00
Phys
0.80
Updated
0.76
Transcript
0.74
SHARES
0.71
Originally
0.67
Favorite
0.67
Welcome
0.65
Calculator
0.65
·
0.65
Activations Density 0.682%