INDEX
Explanations
phrases related to descriptions of events happening alongside something else
New Auto-Interp
Negative Logits
asse
-0.79
jab
-0.79
̶
-0.72
ortex
-0.72
oria
-0.71
ck
-0.69
nexus
-0.69
ggle
-0.68
bell
-0.67
ahime
-0.67
POSITIVE LOGITS
others
0.78
fellow
0.74
hers
0.73
lihood
0.72
him
0.69
other
0.68
ours
0.68
yours
0.66
colleagues
0.66
pins
0.65
Activations Density 0.041%