INDEX
Explanations
mentions of being impressed or impressed by something/someone
expressions of admiration or being impressed
New Auto-Interp
Negative Logits
turn
-0.69
access
-0.68
rama
-0.68
violence
-0.67
extraction
-0.67
diverted
-0.63
abort
-0.61
leasing
-0.61
population
-0.61
build
-0.61
POSITIVE LOGITS
impressed
1.15
resemb
0.85
sson
0.76
Wenger
0.74
surprised
0.73
MENTS
0.71
acebook
0.69
admire
0.69
oresc
0.69
GGGGGGGG
0.69
Activations Density 0.009%