INDEX
Explanations
phrases related to literal interpretations or descriptions
distinctions between literal and figurative language
New Auto-Interp
Negative Logits
reau
-0.76
Crash
-0.74
pered
-0.73
edu
-0.72
Rated
-0.69
angan
-0.66
afety
-0.66
meet
-0.66
sponsored
-0.66
Recommend
-0.66
POSITIVE LOGITS
orical
0.97
literal
0.87
figur
0.85
Meaning
0.83
orically
0.79
meanings
0.79
ãĤ¨ãĥ«
0.79
TY
0.78
meaning
0.78
interpretation
0.75
Activations Density 0.020%