INDEX
Explanations
phrases or expressions indicative of awareness or understanding of specific information
New Auto-Interp
Negative Logits
فريبيس
-0.56
Empereur
-0.49
İstinadlar
-0.47
咪
-0.44
intéress
-0.44
emper
-0.43
erapeu
-0.43
erçe
-0.42
invokingState
-0.42
writerow
-0.42
POSITIVE LOGITS
knowledge
1.24
knowledge
1.12
Knowledge
1.08
Knowledge
1.03
KNOWLEDGE
0.96
Know
0.92
knowing
0.90
Know
0.87
conocimiento
0.86
know
0.85
Activations Density 0.013%