INDEX
Explanations
phrases indicating comparison or equivalence
repetitions of the word "just" across various contexts
New Auto-Interp
Negative Logits
sidx
-0.70
seiz
-0.66
>>>>>>>>
-0.59
ership
-0.59
Also
-0.59
trave
-0.57
Cooperation
-0.56
seek
-0.55
apolis
-0.55
necks
-0.54
POSITIVE LOGITS
ifiable
1.17
ifications
1.08
if
0.92
ices
0.91
ified
0.89
IFIED
0.83
icia
0.83
desserts
0.79
shy
0.78
fine
0.78
Activations Density 0.081%