INDEX
Explanations
references to different types of media such as TV shows, movies, books, and anime
New Auto-Interp
Negative Logits
dden
-0.67
Helpful
-0.59
forgetting
-0.59
sbm
-0.56
Unknown
-0.56
Redd
-0.55
Wrong
-0.55
ãģ¦
-0.54
dropping
-0.54
understatement
-0.54
POSITIVE LOGITS
consists
1.29
consisted
1.28
comprises
1.16
revolves
1.15
debuted
1.15
contains
1.03
boasts
1.02
premiered
1.02
underwent
1.01
originated
1.01
Activations Density 0.324%