INDEX
Explanations
references to specific items or objects
the word "these" in various contexts
New Auto-Interp
Negative Logits
erate
-0.78
Disk
-0.73
hood
-0.73
VW
-0.72
igation
-0.71
atform
-0.71
iness
-0.70
achus
-0.70
herty
-0.69
esm
-0.69
POSITIVE LOGITS
guys
0.94
fellows
0.84
kinds
0.80
sights
0.79
lovely
0.76
Situation
0.76
darn
0.76
nifty
0.74
sorts
0.74
crazy
0.69
Activations Density 0.095%