INDEX
Explanations
phrases indicating awareness or comprehension
New Auto-Interp
Negative Logits
__":
-0.58
AssemblyTitle
-0.54
Ђ
-0.53
__":
-0.52
__':
-0.52
الحره
-0.51
tiérrez
-0.50
geführt
-0.49
formis
-0.49
tille
-0.48
POSITIVE LOGITS
WriteAttribute
0.68
guenos
0.67
knows
0.66
conozco
0.65
tudom
0.64
know
0.64
BeginContext
0.64
IconData
0.63
我知道
0.63
understands
0.62
Activations Density 0.112%