INDEX
Explanations
references to caribou or variants of the word in various contexts
New Auto-Interp
Negative Logits
kl
-0.17
jury
-0.17
ktor
-0.15
ects
-0.15
OLON
-0.15
partial
-0.15
uncomp
-0.15
andard
-0.15
iram
-0.15
mitt
-0.14
POSITIVE LOGITS
thers
0.24
ibbean
0.24
bage
0.20
olina
0.20
inas
0.20
bohydr
0.20
oline
0.20
oload
0.19
ynn
0.19
isle
0.19
Activations Density 0.029%