INDEX
Explanations
assertions related to identity and existence
New Auto-Interp
Negative Logits
الحره
-0.89
CloseOperation
-0.83
виправивши
-0.81
pinulongan
-0.80
bezeichneter
-0.79
AsUp
-0.76
ungkin
-0.75
+#+#
-0.74
فريبيس
-0.73
IsContent
-0.73
POSITIVE LOGITS
!)
0.75
!!!)
0.71
ENTIRE
0.66
(!)
0.64
(!)
0.63
!!)
0.62
!),
0.61
ONLY
0.61
!).
0.60
!
0.59
Activations Density 0.115%