INDEX
Explanations
mentions of knowledge or expertise
references to knowledge and understanding in various contexts
New Auto-Interp
Negative Logits
odder
-0.72
itter
-0.68
sty
-0.64
emale
-0.62
Pengu
-0.61
odd
-0.61
Temper
-0.59
issions
-0.58
dule
-0.57
raine
-0.56
POSITIVE LOGITS
ledge
1.09
glean
0.99
gained
0.97
lege
0.93
about
0.93
fulness
0.91
base
0.90
comprehension
0.84
pertaining
0.84
base
0.82
Activations Density 0.060%