INDEX
Explanations
phrases indicating universality or consistency across all entities
phrases that indicate widespread or collective situations
New Auto-Interp
Negative Logits
Siber
-0.81
uala
-0.79
Rout
-0.67
Berk
-0.66
Mub
-0.65
Ô
-0.64
eport
-0.63
Ads
-0.63
Frie
-0.63
anson
-0.62
POSITIVE LOGITS
notch
0.73
stairs
0.64
lihood
0.64
etheless
0.63
isphere
0.62
improvement
0.62
rust
0.60
pathological
0.59
atics
0.58
equation
0.57
Activations Density 0.054%