INDEX
Explanations
proper nouns or specific terms/phrases
phrases that indicate examples or specifications of previously mentioned concepts
New Auto-Interp
Negative Logits
tears
-0.58
ou
-0.58
ater
-0.57
next
-0.56
ickets
-0.55
luck
-0.55
im
-0.55
imeter
-0.54
ort
-0.54
imming
-0.53
POSITIVE LOGITS
namely
3.95
viz
2.53
Firstly
1.45
secondly
1.42
notably
1.42
ie
1.35
Specifically
1.28
whereby
1.26
Including
1.23
albeit
1.19
Activations Density 0.009%