INDEX
Explanations
the word "thing"
references to the concept of "thing."
New Auto-Interp
Negative Logits
inav
-0.75
brids
-0.73
cling
-0.71
largeDownload
-0.67
ervation
-0.67
oufl
-0.66
irl
-0.65
ctic
-0.65
incinn
-0.65
ardi
-0.65
POSITIVE LOGITS
iverse
0.89
Else
0.89
happ
0.88
thing
0.87
happening
0.85
Thing
0.80
Valiant
0.79
happened
0.78
happens
0.73
imaginable
0.72
Activations Density 0.026%