INDEX
Explanations
phrases related to spatial concepts and physical sensations
instances of legal or official proceedings
New Auto-Interp
Negative Logits
nightly
-0.88
coral
-0.80
swe
-0.78
taste
-0.78
thrill
-0.77
dance
-0.77
goalie
-0.75
dynam
-0.75
fancy
-0.74
butterflies
-0.74
POSITIVE LOGITS
According
1.64
Regarding
1.60
However
1.54
Furthermore
1.54
Moreover
1.48
Asked
1.47
Advertisement
1.45
Comment
1.45
Nevertheless
1.43
Section
1.43
Activations Density 0.561%