INDEX
Explanations
literal interpretations or expressions
references to literal and metaphorical concepts
New Auto-Interp
Negative Logits
ulates
-0.71
Crash
-0.67
ramid
-0.67
arma
-0.65
shows
-0.64
Features
-0.64
Share
-0.64
Privacy
-0.63
spr
-0.63
company
-0.62
POSITIVE LOGITS
literal
1.16
orical
0.82
analogue
0.80
canonical
0.73
interpretation
0.72
TY
0.71
brunt
0.70
ãĤ¦ãĤ¹
0.70
incarnation
0.70
alogue
0.69
Activations Density 0.005%