INDEX
Explanations
long, thoughtful statements or discussions on various topics
New Auto-Interp
Negative Logits
ares
-0.74
strip
-0.65
eal
-0.62
assisted
-0.62
buster
-0.62
bies
-0.60
endeavors
-0.60
achi
-0.60
icut
-0.57
bombed
-0.55
POSITIVE LOGITS
plenty
1.15
ample
0.94
overlap
0.89
precedent
0.87
no
0.85
similarities
0.83
lots
0.81
unanim
0.81
disagreement
0.80
variability
0.77
Activations Density 2.842%