INDEX
Explanations
comments sections in texts
New Auto-Interp
Negative Logits
planes
-0.73
plane
-0.71
ously
-0.67
ALLY
-0.66
flies
-0.66
ment
-0.66
points
-0.62
lines
-0.61
aways
-0.60
lessness
-0.60
POSITIVE LOGITS
ource
1.09
cript
1.09
poons
1.08
heet
0.96
ystem
0.94
ometimes
0.94
ensitive
0.93
uggest
0.91
ettings
0.90
ugar
0.90
Activations Density 0.119%