INDEX
Explanations
concepts related to knowledge and understanding
New Auto-Interp
Negative Logits
tigs
-0.37
novel
-0.34
acta
-0.33
passé
-0.32
passée
-0.32
ğlık
-0.31
tack
-0.30
dom
-0.30
curio
-0.30
novels
-0.30
POSITIVE LOGITS
knows
0.82
Knows
0.78
knows
0.77
knowing
0.76
OGND
0.75
knew
0.75
Knows
0.74
conscientes
0.73
ProtoMessage
0.72
understands
0.72
Activations Density 0.360%