INDEX
Explanations
segments of text or phrases that contain dashes or bullet points
New Auto-Interp
Negative Logits
Hass
-0.70
iliary
-0.66
Galile
-0.65
basil
-0.65
Madden
-0.63
ously
-0.63
sclerosis
-0.62
oun
-0.62
activ
-0.61
cavity
-0.61
POSITIVE LOGITS
fuck
0.98
something
0.91
ITS
0.90
notes
0.89
lance
0.88
meaning
0.87
sil
0.87
like
0.86
requ
0.85
reviewed
0.84
Activations Density 0.024%