INDEX
Explanations
phrases related to making a point, expressing beliefs, and emphasizing ideas
conjunctions and phrases that denote relationships or connections
New Auto-Interp
Negative Logits
Bones
-0.66
gorilla
-0.65
Aliens
-0.63
elsen
-0.60
minster
-0.59
Rouge
-0.59
Growth
-0.58
consolidation
-0.58
sterdam
-0.57
Cavs
-0.57
POSITIVE LOGITS
Ïī
0.77
sidx
0.71
isable
0.70
Initialized
0.70
itten
0.70
ÃŁ
0.68
osher
0.66
athed
0.66
abal
0.65
sic
0.65
Activations Density 0.758%